-

# Vibrio parahaemolyticus Risk Assessment – Appendix 3: Risk Assessment Simulation Model

July 19, 2005

### Overview

A Monte Carlo simulation model was developed for the V. parahaemolyticus risk assessment to capture the variability and uncertainty of the description of the processes associated with V. parahaemolyticus densities in oysters, the effects of oyster harvesting, consumer consumption, and human response to the pathogen. This model is made up of biogenic and abiogenic factors. Abiogenic factors include environmental air, water temperatures, and storage times; and biogenic factors include predicted harvest behavior and amounts of oysters consumed. Within the model these factors are combined with growth rate, death rate, and dose-response models. The result is a probabilistic simulation predicting a distribution of baseline risk for each region/season and distributions of risks associated with mitigations.

The model simulations were implemented in @Risk (Palisade). All of the calculations were performed by the Monte Carlo method of resampling from specified input distributions and appropriately combining the sampled values to generate the corresponding output distributions. For each region and season, a total of 10,000 servings were simulated for combinations of 1,000 samples of the uncertainty parameters. Due to the relatively large number of servings consumed within each of the region and season combinations, the number of illnesses implied by the model was determined by the average risk per serving multiplied by the number of servings consumed. The appropriateness of this calculation follows from the Central Limit Theorem. The sum of a large sequence of independent Bernoulli random variables, representing simulated illness outcomes, will converge to the product of the number of variables in the sequence times the average risk of illness (i.e. , as the number of variables in the sequence increases). This it true even when a sequence of random variables are not identically distributed, as is the case here due to differing levels of exposure and hence risk per serving.

### Iteration of the Model: Variability

For each iteration of the model the prediction of the density of pathogenic V. parahaemolyticus per gram of oyster tissue was determined at harvest by applying the estimated distribution of the percentage of V. parahaemolyticus that are pathogenic to the predicted distribution of total V. parahaemolyticus at the time of harvest and then evaluating changes in the density of pathogenic V. parahaemolyticus through the post-harvest module. Levels of total V. parahaemolyticus were also evaluated from time of harvest through the post-harvest module. This approach was adopted because total V. parahaemolyticus levels were necessary to implement the bound on the level at 6 log10, so that the comparable pathogenic levels would not be exceeded. Also, results from the FDA/ISSC retail study (FDA/ISSC, 2001), the only post-harvest/retail study available, provide levels of total V. parahaemolyticus, but not pathogenic V. parahaemolyticus for all regions. We used this study to validate the exposure assessment of our model by comparing levels of total V. parahaemolyticus found at retail with the model predicted levels.

Exposure distributions of predicted numbers of pathogenic V. parahaemolyticus ingested per serving were obtained by combining distributions describing the probabilistic variation of number and meat weight of oysters in a serving and the expected variation of the density of pathogenic V. parahaemolyticus per gram at the end of the Post-harvest process. Individual iterations of the model predicting the number of pathogenic V. parahaemolyticus consumed by an individual were used to calculate a risk of illness. @Risk keeps track of the value of each calculation of risk for the 10,000 iterations. When one simulation is completed summary statistics are available for the 10,000 calculations of risk under both baseline and mitigation scenarios.

The number of iterations was set high enough to allow for a range of all the variables to be run through the model. At this number of iterations the summary statistics (e.g. , mean values) calculated for the risk of illness were found to converge during the simulation; meaning that, by the 10,000th iteration, these values were nearly constant. The Monte Carlo simulation error associated with this aspect of the simulation was determined to correspond to an average coefficient of variation of 0.2% up to ~5%. The precision was lowest when the mean dose (and risk) was low and it approached 0.2% for those regions and seasons that collectively account for >95% of the estimated annual illness burden.

The estimates of mean risk determined by the average of simulated illness outcomes for selected high risk region/seasons confirmed the appropriateness of just using the mean risk rather than directly simulating illness outcomes (as Bernoulli random variables). Thus, predicted numbers of illnesses were obtained by determining the mean risk and then calculating the associated number of illnesses as the product of this estimate and the number of servings (based on the NMFS landings statistics).

### Simulations with New Parameters: Uncertainty

These simulations of 10,000 sampled exposures (and risks) were repeated 1,000 times with selected uncertainty parameters in order to evaluate their influence the model's output (risk). Parameters evaluated on this level of the simulation included the effect of likely year-to-year variation in the distributions of water temperatures, and the uncertainties associated with parameters such as the percentage of total V. parahaemolyticus which are pathogenic, the dose-response and the relative growth rate of V. parahaemolyticus in oysters versus broth cultures. A sample size of 1,000 was selected based on practical time constraints. The software selected for Monte Carlo simulations (@Risk) does not directly facilitate a fully nested two-dimensional (variability and uncertainty) simulation approach. Consequently, the uncertainty dimension of the simulation was conducted by performing simple random sampling of uncertainty parameters in Microsoft Excel per se and then calling a sequence of 1,000 @Risk simulations with the uncertainty parameters fixed at the values corresponding to each of the uncertainty samples obtained.

Simple random sampling is considerably less precise than Latin hypercube (or other types of stratified) sampling. The relative precision or Monte Carlo error of the mean simulation output with respect to uncertainty at the selected sample size of 1,000 was estimated to correspond to a coefficient of variation of ~3-4% of the nominal mean for each region and season combination. It was determined that the most significant source of this variation was due to the Monte Carlo error of the simple random sampling of the dose-response uncertainty. As a consequence of this degree of simulation error, calibration or "anchoring" of the model to CDC estimates of annual illness burden was accomplished by using "rejection" -sampling to obtain a single fixed representative sample of specified precision from the distribution of the dose-response uncertainty. A criteria of <0.1% relative difference between the sample versus the actual (population) mean was used. Thus, a fixed sample of 1,000 dose-response parameters satisfying this criteria was obtained by iteratively taking samples (of size 1,000) from the uncertainty distribution via simple random sampling and rejecting all but the 1st (collection of 1,000) satisfying the chosen criteria. After having obtained a representative sample of 1,000 dose-response parameters, the adjustment factor associated with food-matrix and pathogen-host effects was estimated by anchoring the model to be consistent with CDC estimates of annual illness burden. This was accomplished by running the model with different adjustment factors and then interpolating between the results to obtain a suitable estimate.

Although the model implementation fixes the samples of the uncertainty parameters rather than randomizing them on each model invocation, the effect of the Monte Carlo error is minimal with respect to both the identification of influential variables and the evaluation of effectiveness of mitigations.

### Description of Calculations for Each Step of the Model (@Risk implementation)

A copy (CD-ROM) of the model is available. Fax request for the model to the CFSAN Outreach and Information Center at 1-877-366-3322. Additional information can be found on the spreadsheets:

• Spreadsheet 1. Values used to generate correlated uncertainty distributions used in the assessment, including water temperature data.
• Spreadsheet 2. Simulation results of two-dimensional uncertainty and variability simulations for all regions and seasons (Figure A3-1).

The basic @RISK model showing each step as described below is shown in Figure A3-1. Figures A3-2 to A3-6 show how various parameters are used to derive levels of V. parahaemolyticus at different stages in the pathway. For example, Figure A3-5 shows how the mitigation strategies are incorporated into the model and Figure A3-6 shows how the model is adjusted to include intertidal parameters for the Pacific Northwest region as described in the Exposure Assessment and Appendix 5.

Figure A3-1. Spreadsheet Showing Each Step of the @RISK Vibrio parahaemolyticus Risk Assessment Model

1. Input of the water parameters: For each uncertainty simulation, new values are inserted into cells E8 and E9 which are the mean and standard deviation of the region/season temperature distribution.
2. Simulation of water temperature during harvest: Based on the values in cells E8 and E9 a water temperature is probabilistically selected based on a Normal distribution.
3. Total V. parahaemolyticus at harvest: The total Vp/g density at harvest is determined by using the (regression-based) prediction parameters from cells E12, E13, and E14 to input into cell E19 where the density is calculated (Figure A3-2). If the density is higher than 106, then the density is truncated in cell E20.
4. Percent pathogenic V. parahaemolyticus: Uncertainty values are set in Cells O35 and O36 for the Beta distribution values that are then calculated in Cell M33 (Figure A3-2). The percent pathogenic is then copied to Cell F18 to make the calculation easier to follow.
5. Pathogenic V. parahaemolyticus /g at harvest: The percent pathogenic V. parahaemolyticus (cell F18) is multiplied by the total Vp/g density E19 in Cell F19. The value is truncated at an amount proportional to the 106 value for the total Vp/g density. The proportionality constant is the % pathogenic for this particular iteration.
6. Time-to-refrigeration: Time-to-refrigeration is the duration of time between when the oysters are harvested and initiation of refrigeration, which typically involves the delivery of the oysters to a land based refrigeration site. The time the oyster harvesters were out on the water (cell E29) is estimated using a Beta-Pert distribution with parameters taken from cells E26 (minimum time), E27 (most likely time), and E28 (maximum time). Within the estimated time period, the time the oysters were out of the water is selected from a uniform distribution with one hour as the minimum time and the maximum being the time on the water value (cell E30).
7. Air temperature: The air temperature (cell E34) is calculated based upon the water temperature plus a probabilistically selected value from a normal distribution using the parameters from cells E32 and E33.
8. Pathogenic V. parahaemolyticus/g at 1st refrigeration:
1. For oysters that are harvested by dredging (i.e. , all oysters except those harvested intertidally in the Pacific Northwest), the increase in the V. parahaemolyticus densities (cfu/g) from the time the oysters come out of the water to the time they are first placed in refrigeration is estimated as a function of time (E30) and temperature (E34). Additional growth parameters are provided in cells J7 to J14. Cell E38 calculates the square root of the growth rate which is used to predict the out growth reported in Cell F40. As the growth rate estimate is for V. parahaemolyticus growth in culture with the absence of competitors, a factor (M27) based on experimental observation is used to adjust the growth rate to that of V. parahaemolyticus in an oyster where competition with other microorganisms is present. The factor is an uncertainty factor and is c [JCB1] hanged for each uncertainty simulation. Predicted growth (presented as log10) (cell F40) is added [JCB2] to the initial predicted V. parahaemolyticus density (presented as log10) (cell F20) to obtain predicted counts at first refrigeration (F41) (Figure A3-3).
2. For oysters that are harvested intertidally, additional factors, such as the time the oysters are on tidal flats before being harvest (E21) and the temperature increase the oysters experience from being in the sun are taken into consideration (Figure A3-6). The time the oysters are on the flats are modeled as a time between 4 and 8 hours and the increase in temperature experienced by the oysters is modeled to be between 0 and 10 °C.   Based on the time on the flats and the increased temperature, an estimate of growth is computed to add to the initial V. parahaemolyticus density.
9. Cooldown time: A cooldown time in hours is randomly selected from a uniform distribution between 1 and 10 hours in cell E42.
10. Pathogenic V. parahaemolyticus/g at cooldown: The V. parahaemolyticus continue to grow while the oyster is cooling. The growth rate (cell F43) is a fraction of the initial growth rate estimated as a function of the length of the cooling time. The growth is added to the initial V. parahaemolyticus density of the oyster (cell F44).
11. Storage time: The time that oysters are stored is generated in cell E45 based on a Beta-Pert distribution with the minimum time being 1 day, the most likely time being 6 days and the longest time being 21 days.
12. Pathogenic V. parahaemolyticus/g at consumption: Pathogenic V. parahaemolyticus/g at consumption (cell F52) is determined by multiplying the number of days under refrigeration (cell E45) by the cell death rate under cold storage conditions (cell F47) and then subtracting this amount from the level at cooldown.
13. Number of oysters per serving: The number of oysters in a serving is selected from an array of probable serving sizes (M29) that are weighted to match the estimated numbers of oysters eaten based on an identified consumer survey.
14. Meat weight per oyster serving: The weight of the oyster is probabilistically selected from a lognormal distribution fit to available data on oyster weights. The sampled (or simulated) value is multiplied by the number of oysters consumed corrected for the fraction of whole oyster that is not consumed (mantle fluid) and multiplied by the mean and standard deviation from the lognormal distribution fit. Finally the simulated value (cell E49) is truncated if the total weight of consumed oyster exceeds 2 kg and rounded to 10 grams if the total weight consumed is less than 10 grams.
15. Ingested dose: The ingested dose (cell F51) is determined by multiplying the mass of the oysters consumed (cell E49) by the density of the V. parahaemolyticus present (cell F47). This amount is converted to a whole number by using a probabilistic Poisson estimation of the number of V. parahaemolyticus (cell F52).
16. Dose-Response: The dose response parameters are changed for each simulation and are copied to Cells M37 and M38.
17. Risk of Illness: The calculation of the risk of illness is made in Cell F54. The inputs to the calculation are the dose-response parameters and the ingested dose of pathogenic V. parahaemolyticus (Figure A3-4).

Figure A3-2. Screen Shot of @RISK Spreadsheet Showing How the Levels of Pathogenic Vibrio parahaemolyticus at Harvest were Determined

Figure A3-3. Screen Shot of @RISK Spreadsheet Showing How Vibrio parahaemolyticus Levels at First Refrigeration were Derived

Figure A3-4. Screen Shot of @RISK Spreadsheet Showing How Probability of Illness was Derived using Oyster Servings, Levels of Pathogenic Vibrio parahaemolyticus Consumed and the Dose-Response Parameters

Figure A3-5. Screen Shot of @RISK Spreadsheet Providing an Example of How the Effect of Mitigation on Levels of Pathogenic Vibrio parahaemolyticus Per Serving was calculated

Figure A3-6. Screen Shot of @RISK Spreadsheet Showing the Effect of Intertidal Harvesting on Levels of Vibrio parahaemolyticus in Oysters in the Pacific Northwest

### Advantages and Limitations of the Model

The modeling approach adopted in the present assessment is similar in structure to that of other risk assessments, but has several unique aspects. Foremost, risk has been analyzed in terms of region and season to take proper account of differing harvest practices and climates. Second, the model that was developed is scalable in that it may be applied to finer levels of spatial and/or temporal resolution as data become available. Thirdly, the modeling approach has separated variability from uncertainty by identifying four key variables as uncertain, selecting values for these variables according to specific distributions, and then simulating the effects of variability parameters for all randomly selected values of the uncertainty parameters. In this manner, parameters that represent variability in the model are not mixed with parameters that are uncertain. However, parameters like water temperature can represent uncertainty as well as variability. This separation has allowed for an estimate of the reduction in the overall uncertainty of the analysis that would be gained if the uncertainty of individual variables were reduced. Other microbial risk assessments have separated variability from uncertainty; however, this risk assessment has investigated the gain in information that results from reduction in uncertainty of individual variables. Each of these points is discussed in turn.

The model developed here analyzes risk within categories defined by the four seasons and six primary harvesting regions (Northeast Atlantic, Mid-Atlantic, Gulf of Mexico [divided into 2 regions], and Pacific Northwest [divided into dredged harvesting and intertidal harvesting]) due to differing harvest practices and climates. The analysis could have subdivided further; however, the limitations of acquiring data with respect to a finer level of detail are such that the analysis was conducted at the specified regional level. Analyzing the regions separately allows for an assessment of mitigations that may be tailored to specific regions and seasons. The results may then be used in a subsequent cost benefit analysis. The principle limitation of this approach, which effectively segments the risk assessment into spatial and temporal groupings, is that the results are generally conditional to the a priori definitions of region and season. In particular, the selective application of sensitivity and importance analyses to predefined regions/seasons one at a time (i.e. , to determine influential parameters and uncertainties) yields results that pertain to specific regions and seasons. Consequently, overarching and comparatively more influential effects may be obscured. In the present assessment, air/water temperatures are highly influential variables across region/season groupings and this is partially obscured by the necessity of presenting sensitivity analysis results for selected region/season combinations. The fact that air/water temperatures may be relatively homogeneous within some of the defined region/seasons in the present assessment, and hence relatively inconsequential in such a context, does not obviate the fact that there are wide (and important) variations in these parameters across regions and seasons.

The structure of the model is amenable to further subdivision of locality and season because it is scalable. Specifically, the model is structured to simulate the density of V. parahaemolyticus in oysters at selected steps from harvest to consumption as a function of environmental and industry-specific parameters (e.g. , air/water temperatures, harvest practices) corresponding to locality and season. Given the existence of appropriate data, the model can be used to simulate this process for any appropriately defined harvest location and time frame. The selected level of spatial/temporal categorization (regional and seasonal) was determined by consideration of data availability; most specifically, the quantity and quality of data that could be obtained pertaining to air/water temperatures, harvest practices, V. parahaemolyticus prevalence, and shellfish landing information. Given that more detailed data on air/water temperatures is available from satellite observations a finer level of categorization (e.g. , by state and/or by month) may be possible and/or other methodological approaches (e.g. , harmonic regression) may be applicable to incorporate the effects of climate into further assessments without "segmenting" by region/season categories. However, the utility of such a level of detail in modeling of air/water temperature effects is mitigated by the fact that additional uncertainties may arise if the model is applied on a finer level of detail (e.g. harvesting areas) for which more refined data on industry-specific harvesting practices are missing or incomplete. The effects of such incomplete (or inaccurate) data on the results of the model have not been evaluated at this time, but an analysis of this type may be appropriate in the future if the model is to be applied on the State or shellfish harvesting area level using more refined temperature data available from satellite observations or other sources.

Variability and uncertainty have been separated in the analysis because this separation provides a more informative characterization. We distinguish between model inputs that are less well characterized because of lack of knowledge (uncertainty) and model inputs that are inherently heterogeneous (variable). A model input which is designated as heterogeneous is a parameter that is considered to be naturally variable, even when there is no uncertainty present. For example, the day-to-day water temperatures within each of the different regions and seasons are considered inherently variable and have been modeled as normal distributions with given means and standard deviations. At the same time, the relative growth rate of V. parahaemolyticus in laboratory broth cultures versus that in oysters has been characterized as an uncertainty. This uncertainty was specified to appropriately represent the present lack of knowledge as to the true growth rate versus temperature relationship in oysters and the uncertainty inherent in extrapolating from studies of the relationship in axenic culture. With additional study this uncertainty could be reduced. Variations in day-to-day water temperatures on the other hand can not be reduced by further study. The result of making a distinction between model inputs that are uncertain and model inputs that are variable is that the effect of reducing the uncertainty of each of the uncertainty variables can be assessed separately. Based on such an analysis, uncertainties can be prioritized in order to help identify research efforts that are most likely to help reduce the total uncertainty that has been identified in the risk assessment.

The model can be improved. At present, the modeling approach simulates individual exposures and risks with defined variability largely based on the relationship of total V. parahaemolyticus levels to air/water temperature and a random variation (within defined limits) of the percentage of total V. parahaemolyticus that are pathogenic. However, within region/season groupings the model is not temporal and thus the structure of the model does not allow for a complete and quantitative evaluation of the likely reduction in risk resulting from implementation of the FDA/ISSC V. parahaemolyticus interim control plan (adopted by the ISSC in July 1999 and revised in 2001). This is because, implicitly, the interim control plan operates at a finer level of spatial resolution (e.g. harvest areas) and is time-sensitive in the sense that there is prescribed closure to harvesting after measuring an unsafe level of V. parahaemolyticus and then re-opening once exposure levels have been demonstrated to have subsided. In order to develop an assessment model applicable to an evaluation of this control plan additional data and consequent restructuring of the present assessment would be needed. First the model would need to be scaled to the level of individual shellfish harvesting areas. To accomplish this, further data (e.g. water temperature) are needed with respect to the individual harvesting areas. Second, sensitivity and specificity characteristics of the pathogenic V. parahaemolyticus gene probe methodology used by the individual laboratories (doing the tests) are needed. Third, the model needs to be extended to encompass putative factors responsible for or affecting the rapidity by which pathogenic V. parahaemolyticus levels may change in specific areas and not in others. At present the model predictions are primarily based on temperature. Although variation of the percentage of total pathogenic V. parahaemolyticus has been incorporated in the assessment, this variation has not been modeled (or linked) to any environmental factor(s). The model might be improved by considering the rapidity of turnover of water in shellfish harvesting areas based on levels of freshwater flows, tide changes, wind direction, and depth of harvesting area if these environmental factors have an effect on persistence of pathogenic V. parahaemolyticus. It is, however, unknown at the present time how these factors may affect the persistence of pathogenic V. parahaemolyticus and hence this remains an area for future study.

If and when such model refinement is feasible, more sophisticated approaches to modeling of the data may be appropriate. For example, whereas normal distribution approximations of water/air temperatures were found to be sufficient at the regional/seasonal level, stochastic weather models (e.g. , Richardson, 1981), which better represent skewness of temperature distributions as a consequence of precipitation patterns, may help facilitate a more unified approach that is not based on segmentation by season (and region).

### Risk Assessment Simulation Model

• Spreadsheet 1: Data Used to Generate the Statistics in Spreadsheet 2

This spreadsheet was used to generate the correlated statistics used in Appendix II, Spreadsheet 2.

This spreadsheet is separated from Spreadsheet 2 for simulation performance reasons. Select a region worksheet to see how the temperature statistics were generated.

• Spreadsheet 2: Model for the FDA Vibrio parahaemolyticus Risk Assessment

Running simulations with this spreadsheet requires the user to have the spreadsheet add-in "@Risk" installed in Excel (Excel is a product of Microsoft and @Risk is available from the Palisade Corporation www.palisade.com)**.

In a typical run, the cells F54, H54, I54, and J54 were selected as @Risk "outputs" and the simulation settings were set with iterations at 10,000 and simulations at 1000. New uncertainty parameters are brought in to the spreadsheet using the macro "TotalUncert". Add this macro to the simulations settings to be executed after a simulation run.

*There is no "Help Desk" available to call and ask for explanation, instruction, or assistance.

**If @Risk software is not loaded onto the computer, then numeric values will not appear in the cells of the Excel spreadsheet. The label #NAME? will appear in the cells of the spreadsheets. It will be of minimal value to run the computer model without @Risk already loaded onto the computer. @Risk is a product of the Palisade Corporation (www.palisade.com). Use of @Risk, or any other software product, in this risk model should not be construed as an endorsement of such products by the FDA.

Questions on the running of the spreadsheet may be directed to Mark Walderhaug at Mark.Walderhaug@fda.hhs.gov.