• Decrease font size
  • Return font size to normal
  • Increase font size
U.S. Department of Health and Human Services


  • Print
  • Share
  • E-mail

Vibrio parahaemolyticus Risk Assessment – Appendix 4: Details of the Data Analysis for Hazard Characterization

July 19, 2005

Table of Contents

Two illness endpoints were evaluated in the hazard characterization: (1) gastrointestinal illness and (2) septicemia. A dose-response for the probability of illness was determined by fitting selected parametric dose-response models to the available feeding trials and then comparing model-based predictions of illness based on these dose-response models to CDC's best estimate of the average annual number of illnesses occurring due to raw oyster consumption. The occurrence of septicemia was modeled as an event conditional on the occurrence of illness with a frequency that was assumed to be independent of dose. The population (of oyster consumers) was assumed to be homogeneous with respect to susceptibility to gastrointestinal illness but not septicemia. Based on evaluation of the available data, a subset of the population with predisposing (immunocompromised) health conditions was estimated to be at higher risk of developing septicemia.

Dose-Response for Probability of Illness

As a starting point, a dose-response for illness was initially estimated by fitting selected dose-response models to pooled data on the incidence of gastrointestinal illness from human volunteer studies. The pooled data were taken from the studies by Takikawa (1958), Aiso and Fujiwara (1963), and Sanyal and Sen (1974). Collectively, a total of 20 healthy volunteers were administered Kanagawa positive V. parahaemolyticus at doses ranging from 2.3 to 9-log10 cfu in bicarbonate buffer. The dose-response observed is shown in Table A4-1.

Table A4-1. Observed Incidence of Gastroenteritis in Healthy Human Subjects Fed Kanagawa-positive Vibrio parahaemolyticus Strains Administered with Bicarbonate
Dose (log10cfu)Number of IllnessesNumber of SubjectsPercentage Responding

Data from Takikawa (1958), Aiso and Fujiwara (1963), and Sanyal and Sen (1974)

The dose response models that were used in the evaluation (Beta-Poisson, Probit, and Gompertz) were selected a priori to span a range of steepness in the dose-response and consequent differences in predictions when extrapolating away from the relatively high dose region where some adverse response was observed in the feeding trials (e.g., extrapolation of the dose-response below 3-log10 cfu). The functional form of the selected models is shown in Table A4-2.

Table A4-2. Selected Dose-Response Models Fit to the Observed Incidence of Illness (Gastroenteritis) in Healthy Human Subjects Administered Vibrio parahaemolyticus in Feeding Trials Studies
Dose-Response ModelRisk of Illness (Gastroenteritis) as a Function of Dose a
Beta-PoissonPr(ill | d) = 1−(1+d/β)−α
ProbitaPr(ill | d) = Φ(α + β*f(d))
Gompertz1Pr(ill | d) = 1 − exp[−exp[α + β*f(d)]]

a f(d) = log10(d) is the effective dose corresponding to an ingested dose level d

For both the Probit and the Gompertz models an effective dose was defined based on the ingested dose (number of microorganisms). Some appropriate transformation of the ingested number of microorganisms is necessary for both of these models to ensure that the probability of illness approaches zero as the ingested dose approaches zero. Although a number of transformations exist for which this property will hold, a log transformation was adopted here. Transformation of the ingested dose is not applicable with respect to the Beta-Poisson model. An approximate formula for the Beta-Poisson dose-response (which is shown in Table A4-2) was found to be an appropriate alternative to the exact formula (based on the hypergeometric function) because the data set to which the model was fit was such that parameter estimates obtained generally satisfied the necessary condition that β>>α and β>>1. The approximate formula for this model was used for both parameter estimation and in the simulation of the risk assessment model.

The three dose-response models were fit to the data shown in Table A4-1 by the method of maximum likelihood (MLE). All of the models provided an adequate statistical fit to the data. Based on the deviance between the MLE model fits and the observed data (a likelihood-ratio based goodness-of-fit measure), none of the model fits could be statistically rejected. The deviances between model fits and observed data were 1.0 for the Beta-Poisson, 0.85 for the Probit, and 1.17 for the Gompertz. Given 5 data points and 2 parameters for each model, these goodness-of-fit statistics are distributed as a Chi-square with 3 degrees of freedom. Thus, the p-values associated with the fit of the models to the data are 0.80, 0.84 and 0.75 for the Beta-Poisson, the Probit, and the Gompertz, respectively; all well above a rejection threshold of 0.05.

A nonparametric bootstrap method was used to characterize uncertainty distributions of model parameters about the MLE fit to the observed data. Bootstrap distributions of parameter estimates obtained by this procedure are shown in Table A4-3. Following the nonparametric bootstrap procedure, a probability is associated with alternative (hypothetical) outcomes for the experimental data based on the assumption that the true probability of response at each experimental dose level is equal to that empirically observed. For example, at the dose level of 6-log10, 1 illness was observed in 2 dosed subjects. Assuming a true probability of response of 50% at this dose level, alternative outcomes of 0, 1, and 2 illnesses would be expected to occur at this dose level with frequencies of 25%, 50% and 25%, respectively, under hypothetical replication of the experiment. The probability of alternative outcomes for the experiment as a whole is the product of the probabilities associated with each dose level in the combined data set. For each possible outcome, the dose-response models were refit to obtain best estimates of the parameters, again by the method of maximum likelihood. Collectively, the set of parameter estimates obtained, weighted by the associated probabilities of the outcomes, were used to define a bootstrap uncertainty distribution for the parameters of each of the dose response models.

The bootstrapping approach was utilized here to characterize the uncertainty distribution of the dose-response parameters due to the relatively small sample size of the data. As a consequence of the small samples size asymptotic methods, such as the Wald or likelihood-ratio based methods, were considered inappropriate. The nonparametric bootstrap approach was chosen over a parametric approach for simplicity, since the probability of alternative outcomes is determined solely by the observed data. Under a parametric bootstrap approach the probabilities of alternative outcomes would differ for different dose-response models. The MLE fits of all three models predict low probability of illness below 5-log10 and high probability of illness at or above the highest dose level of 9-log10. Consequently, use of a parametric bootstrap approach would not give substantially different results compared to the nonparametric approach. Most of the parameter uncertainty is associated with variability of the outcome response at the mid-dose levels of 6 and 7-log10 cfu (under hypothetical replication of the experiment).

The SAS NLIN procedure was used to obtain maximum likelihood estimates by the method of iteratively re-weighted least squares. For some of the bootstrap outcomes, including the first 7 listed in Table A4-3, the likelihood function of the data was relatively flat and convergence of the estimation procedure was not obtained. More detailed results of refitting the Beta-Poisson dose-response to bootstrap outcomes are shown in Table A4-4. Although converged estimates of MLEs were not obtained for the first 7 outcomes listed in Table A4-4, the (unconverged) model fits to the outcomes were adequate based on the p-values of the deviance statistic. The probability associated with all of the unconverged estimates is a relatively high 31.5%. Rescaling of the likelihood was attempted to obtain better convergence of the estimation algorithm for this model and these outcomes but definitive estimates were not obtained. It may be that the lack of convergence is a consequence of the lack of existence of an MLE for the model parameters for some outcomes, rather than just a numerical scaling problem.

Table A4-3. Non-Parametric Bootstrap Estimates of Parameter Uncertainty Distributions for the Beta-Poisson, Probit and Gompertz Dose-Response Model Fits to Human Feeding Trials Data

 a bootstrap probability and MLEs corresponding to the observed data per se

Table A4-4. Maximum Likelihood Parameter Estimates and Goodness-of-Fit Statistics for Beta-Poisson Dose-Response Model Fits to Nonparametric Bootstrap Outcomes
 Bootstrap OutcomeaMLEs of ParametersLikeli-
hood of
Bootstrap Outcome
MLE of
Log10 ID50
Deviance of Fit to
Bootstrap Outcome
P-Value of Fit to
Bootstrap Outcome

* unconverged estimates

a bootstrap outcomes where x1 denotes the (hypothetical) number of illnesses in the 1st dose group (2.3-log10 cfu), x2 denotes the number of illnesses in the 2nd dose group (5.0-log10 cfu), ..., x5 denotes the number of illnesses in the 5th dose group (9.0-log10 cfu)

The MLEs and bootstrap uncertainty distributions of the parameters of the selected dose-response models shown in Table A4-3 were subsequently used in risk assessment model simulations to obtain predictions of illness based on the model-predicted distributions of dose to which raw oyster consumers are exposed. The resulting predictions of illness were compared to the CDC's best estimate of the annual illness rate (2,800 cases/year), which is based on the assumption that only 5% of illness is culture-confirmed (Mead et al., 1999). The model-based predictions of illness rates were found to be inconsistent with the CDC estimate for all three dose-response models (Beta-Poisson, Probit, and Gompertz) that were considered as part of the assessment. Possible reasons for this inconsistency include differences in the food matrix and host effects in the feeding trials studies compared to that associated with a diverse population of consumers exposed to V. parahaemolyticus via raw oyster consumption.

As a consequence of the identified inconsistency, the CDC's best estimate of the annual illness rate (~2,800 cases/year) was taken to be an additional data point for the purpose of dose-response estimation. A nonspecific location parameter was introduced into each dose-response model and this parameter was then adjusted until resulting risk assessment predictions of illness were centered (or "anchored") to the CDC estimate of 2,800 cases/year based on simulations using the estimated distributions of pathogenic dose consumed as derived in the exposure assessment. The resulting dose-response adjustment corresponded to a change in the location of each model (i.e., a change in the β parameter of the Beta-Poisson model and the α parameter of the Probit and the Gompertz models) relative to that estimated based on the feeding trials data alone. Estimates of 1.4, 1.3, and 3.3 greater log10 ID50 under conditions of population exposure versus that of controlled exposure with antacid in human volunteers were obtained for the Beta-Poisson, Probit, and Gompertz models, respectively. Given the uncertainties associated with the CDC's best estimate of the average yearly illness burden (e.g., due to uncertainty of underreporting of illness), no formal statistical criteria was used in the process of anchoring each dose-response to this estimate.

After anchoring each of the dose-response models (in turn) to the CDC's best estimate of annual illness burden, the unconverged bootstrap estimates of dose-response uncertainty for two of the three models (the Probit and the Gompertz) were found to correspond to extremely low levels of risks. When applying these two models for the purpose of Risk Characterization, these tails of the uncertainty distributions, driven by unconverged estimates, were found to be generally inconsistent with CDC estimates of annual illness for any reasonable magnitude of the frequency by which illnesses are reported. This is evident in Figure A4-1, which shows the mean and central 95% of the uncertainty distribution of log10 ID50 and log10 ID001 (i.e., the infectious dose levels corresponding to 50% and 0.1% illness rates, respectively). The wider range of the uncertainty distributions of log10 ID001 for the Probit and Gompertz models is evident and is a consequence of the substantial impact of the unconverged estimates for these two models. That is, the unconverged parameter estimates for both the Probit and Gompertz model correspond to the upper portion of the 95% uncertainty range. Consequently, unconverged estimates for these two models (Probit and Gompertz) were considered implausible and were not retained with respect to characterizing the dose-response uncertainty and the suitability of these models for the purpose of Risk Characterization. The effect of dropping the unconverged estimates is shown in Figure A4-1. The impact of unconverged estimates for the Beta-Poisson model was found to be much less substantial and therefore uncertainty of model predictions were based on retaining all of the bootstrap estimates of uncertainty in the characterization of the dose-response using this model.

See caption

Figure A4-1. Bootstrap Estimates of the 2.5%-tile, the Mean, and the 97.5%-tile of the Uncertainty Distribution of ID50 and ID001 Based on Fit of the Beta-Poisson, Probit and Gompertz Models to the Human Feeding Trials Data (With and Without Unconverged Parameter Estimates Being Retained).

Probability of Septicemia Given the Occurrence of Illness

Probabilities of septicemia occurring in healthy and immunocompromised individuals were estimated based on an evaluation of the frequency of putatively predisposing health conditions and illness outcome types (gastroenteritis versus septicemia) in a CDC case series of culture-confirmed illnesses. The dataset selected for analysis consisted of oyster-related culture-confirmed cases that were reported in Gulf Coast states during 1997 and 1998 (Angulo and Evans, 1999). This data set was considered particularly relevant as a basis for estimation because the data collected in this region during this period of time was less likely to be biased, due to a heightened awareness of V. parahaemolyticus illness following the outbreaks that occurred at that time. The CDC dataset consisted of a total of 107 oyster-related culture-confirmed V. parahaemolyticus cases (sporadic- and outbreak-related) with 102 identified cases of gastroenteritis, 5 cases of septicemia and one death. Among those cases in the series with available information on health conditions, 23 of 79 (29%) illnesses occurred in individuals with an identified underlying chronic (immunocompromising) health condition; 27 of 90 (30%) gastroenteritis illnesses were hospitalized and 3 of 4 (75%) septicemia illnesses occurred in individuals with an underlying chronic (immunocompromising) health condition.

These identified conditional probabilities of health conditions given illness outcome type can be used to obtain corresponding conditional probabilities of illness outcome type given health condition by an application of Bayes' theorem. Specifically, based on Bayes' theorem, the frequency of an illness outcome type in a (homogeneous) subpopulation defined by the presence of a predisposing health condition is related to the frequency of the predisposing condition among individuals with that illness outcome and the marginal probabilities of the outcome type and the predisposing health condition with respect to the overall population. This relationship is:

Pr(illness outcome | health status) = 
 Pr(health status | illness outcome) * Pr(illness outcome) 

Pr(health status)


where, for example, Pr(illness outcome | health status) denotes the frequency or probability of an illness outcome type within a subpopulation of individuals defined by the existence of a common predisposing health condition ("health status"). All factors on the right hand side of the equation are identifiable based on the epidemiological case series data.

Substituting appropriate observed frequencies (based on the CDC case series data) into the above equation provides point estimates, with respect to the population of culture-confirmed illness, of the probability of septicemia occurring within any appropriately defined subpopulation identified by a common health status. For the assessment of risk of septicemia, the population was considered to consist of two risk subgroups defined by the presence (or absence) of a predisposing "immunocompromising" health condition. Implicitly it is assumed that the risk within each subgroup is relatively homogeneous. Thus, for the subpopulation identified as having an "immunocompromised" chronic health condition the probability of septicemia (given that illness occurs and is culture-confirmed) was estimated from Bayes' theorem and the CDC data as follows:

Pr(septecemia | immunocompromised) = 
 Pr(immunocompromised | septecemia) * Pr(septecemia)



3/4 * 5/107




The probability of septicemia occurring consequent to culture-confirmed illness in healthy individuals was estimated in a similar fashion.  The conditional probabilities obtained for progression of illness to septicemia are:

Pr(septecemia | immunocompromised) = 0.12

Pr(septecemia | healthy) = 0.0165

Pr(septecemia) = 0.047

Pr(death | septecemia) = 0.2

It is important to recognize that these estimated probabilities pertain to the population of culture-confirmed illnesses; i.e., these are probabilities conditional on both the occurrence of illness and the identification of that illness by a confirmed culture. Thus for example, given that a culture-confirmed illness occurs there is an overall 4.7% chance that the illness outcome will be septicemia. The two primary illness outcomes (i.e., gastroenteritis only or gastroenteritis with progression to septicemia) are mutually exclusive. Death was considered a separate outcome subsequent only to the occurrence of septicemia.

In order to obtain estimates of the probabilities of septicemia applicable to all V. parahaemolyticus illness, regardless as to whether culture-confirmed or not, consideration needs to be given to apparent selection biases in the case series data. While cases of septicemia are unlikely to go undiagnosed (i.e., unconfirmed) as a function of a patient's health status, this may not be true of gastroenteritis. If the frequency by which illnesses are culture-confirmed increases with the severity of illness and an immunocompromising health condition predisposes to more severe gastroenteritis (as well as increased risk of septicemia) then one would expect immunocompromised individuals to be over-represented in case series of gastroenteritis. Based on analysis of the 1997-1998 CDC case series data this would appear to be the case, although differences in consumption behavior could also be partially responsible.

Differential reporting rates for culture-confirmed gastroenteritis occurring in immunocompromised versus healthy individuals can be estimated from the case series data based on relationships between conditional probabilities and marginal probabilities that are implied by Bayes' theorem. The appropriate relationships are:

Pr(CC) = Pr(CC | S)* Pr(S) + Pr(CC | S) * Pr(S)


Pr(S | CC) = 
 Pr(CC | S) * Pr(S) 

Pr(CC | S) * Pr(S) +  Pr(CC | S) * Pr(S)



Pr(CC) = Pr(culture - confirmed)
Pr(S) = Pr(immunocompromised)
Pr(S) = Pr(healthy)
Pr(S | CC) = Pr(immonocompromised | culture - confirmed)
Pr(CC | S) = Pr(culture - confirmed | immonocompromised)
Pr(CC | S) = (culture - confirmed | healthy)

These two equations stipulate that the weighted average of differential reporting rates in immunocompromised versus otherwise healthy subpopulations is equal to the overall aggregate reporting rate subject to a restriction, or constraint, that these differential reporting rates are consistent with the frequency of individuals with underlying immunocompromising (chronic) health conditions in the population of culture-confirmed illness.

By assumption, the aggregate probability of illness being culture-confirmed (i.e., "reported") has been taken to 1 in 20 cases (5%) based on the study by Mead et al. (1999). The frequency of immunocompromised individuals in the CDC (gastroenteritis) case series data is 29%. Two additional unknowns in the equations are the marginal frequencies of immunocompromised and healthy statuses. In this regard, it has been estimated that approximately 7% of the general population has an underlying health condition predisposing to V. vulnificus infection (Klontz, 1997). The same set of health conditions would likely predispose to more severe V. parahaemolyticus illness and are, in fact, the types of health conditions reported in the 1997-1998 CDC case series data of V. parahaemolyticus illness. Based on this observation, it is assumed the same frequency of predisposing health conditions (7%) applies to V. parahaemolyticus and that immunocompromised individuals consume raw oysters at the same frequency as the general population.

Substituting these estimates into the relationships above gives a system of two equations in two unknowns. Solving for these two unknowns yields point estimates of the rate by which illnesses are culture-confirmed for immunocompromised versus healthy subpopulations:

Pr(CC | S) = Pr(culture - confirmed | immonocompromised) = 0.21

Pr(CC | S) = (culture - confirmed | healthy) = 0.038

Based on these estimates, predicted probabilities for occurrence of septicemia among all V. parahaemolyticus illness, both culture-confirmed and unreported, are:

Pr(septecemia | S) = Pr(septecemia | S & CC) * Pr(CC | S) = 0.12 * 0.21 = 0.025

Pr(septecemia | S) = Pr(septecemia | S & CC) * Pr(CC | S) = 0.0165 * 0.038 = 0.00063

Finally, the overall risk of illness progressing to septicemia among the population of all V. parahaemolyticus illness is the weighted average of the conditional probabilities of septicemia for immunocompromised and healthy individuals:

Pr(septicemia | illness) = 
Pr(immunocompromised) * Pr(septicemia | illness & immunocompromised) +
Pr(healthy) * Pr(septicemia | illness & healthy)

0.07*0.025 + 0.93*0.00063



Based on this analysis, a combined variability/uncertainty distribution for the probable number of septicemia which may occur in a given year was defined as a binomial distribution with size parameter equal to the total number of illnesses predicted (i.e., in each individual simulation of the risk assessment model) and probability parameter equal to the estimated aggregate risk of septicemia following illness (0.0023). A tacit assumption here is that the probability of septicemia occurring is independent of the dose leading to infection and illness. The uncertainty associated with estimated progression rates in immunocompromised and healthy individuals obtained via Bayes' theorem have not been fully evaluated here. However, the uncertainties are considered to be substantially less than that already characterized with respect to the number of illnesses occurring.

Return to Table of Contents