Vibrio parahaemolyticus Risk Assessment – Appendix 6: Regression-Based Sensitivity Analyses to Determine Influential Variability and Uncertainty Parameters
July 19, 2005
Sensitivity Analysis of Variability Parameters
A deficiency associated with sensitivity analysis via Tornado plots (i.e. , pairwise correlations) is that the importance of various factors is evaluated one at a time. Correlation or multicollinearity between input factors can confound the interpretation of importance via a Tornado plot. An alternative method of influence or importance assessment is based on estimation of the percentage of variation of the output variable (e.g., log10 risk per serving) attributable to selected factors and combinations of factors. A variety of parametric and nonparametric methods have been developed to estimate importance based on the concept of variance decomposition (i.e. , attribution of variance to selected factors) (McKay, 1995; Saltelli et al., 2000; Archer et al., 1997; Chan et al., 1997).
Parametric, or regression-based methods, are the easiest to implement and do not entail substantial error when the fit of the regression model used to assess importance is a reasonable approximation of the model simulation output (Manteufel, 1996). For the V. parahaemolyticus risk assessment model, simple regression models were found to be reasonable and therefore appropriate for the assessment of importance based on variance decomposition. Table A6-1 gives the results of one such analysis, with a measure of sensitivity based on relative partial sums of squares (Rose et al., 1991), applied to assess importance of variability parameters on the log10 of individual risk per serving for both the Gulf Coast summer harvest and the Pacific Northwest intertidal summer harvest. The log transformation of risk per serving was used as the output variable in this evaluation given the observation that individual risk per serving was a highly asymmetric distribution.
Both a linear regression and a quadratic response surface were considered as approximations of model simulation output. However, a quadratic response surface did not provide a substantially better fit than a simple linear regression. Hence, only the results of the linear regression are presented in Table A6-1. Sensitivity coefficients based on the proportion of total variation explained by each factor/parameter were calculated from regression fits according to the formula
Sensitivity coefficient = RPSSi =
RPSSi is the relative partial sum of squares attributable to factor i.
RSS-i is the regression sum of squares for a regression model with factor i not present as a predictor.
RSS is the regression sum of squares of a full regression model with all factors present.
TSS is the total variation (total sum of squares) of the output variable (i.e. , log10 of risk per serving).
The difference between RSS and RSS-i is the amount of variation in the output variable that can be explained by inclusion of i-th factor and its (potential) interaction with other factors depending on the form of the approximating regression. Thus, the relative partial sum of squares is an indication of the additional percentage of variance of the output variable explained by a parameter, given that all other parameters are included in the regression model. The sum of the percentage of additional variation explained by each parameter is not, however, exactly equal the total amount of variation explained by the full approximating regression since partial (or type III) sums of squares do not add up to the total regression sum of squares.
|Region / Season||Parameter||Sensitivity Coefficient a|
|Gulf (Louisiana) / summer||Log10 V. parahaemolyticus per g at harvest||21.4%|
|Duration of cooldown||3.9%|
|Grams of oysters consumed||3.2%|
|Length of refrigeration time||2.1%|
|Ambient air temperature||1.7%|
|R2 of full model of log10 risk per serving||64%|
(intertidal) / summer
|Log10 V. parahaemolyticus per g at harvest||12.2%|
|Grams of oysters consumed||4.2%|
|Length of refrigeration time||2.3%|
|Duration of intertidal exposure||1.7%|
|Duration of cooldown||1%|
|Time unrefrigerated (after collection)||0.7%|
|R2 of full model of log10 risk per serving||72%|
a mean of sensitivity coefficients (or the R2 of the full linear regression model approximating simulation model output) over 200 uncertainty sample realizations
As indicated by the results shown in Table A6-1, for the Gulf Coast summer harvest an approximating linear regression, with seven variability parameters as predictors, explains 64% of the variation in log10 risk per serving (the RSS of the full model divided by TSS).
The results of the variance decomposition under the linear regression model indicate that the variation of log10 V. parahaemolyticus/g at time of harvest is the single most important determinant of the variation of log10 risk per serving for this region/season. The variation of the percentage pathogenic (across individual servings) is also identified as an important component of the variation of log10 risk per serving, as is the time unrefrigerated. The relative ranking of importance of these parameters by the regression-based approach is the same as that obtained by the Tornado plot (i.e. , pairwise correlation) analysis shown in the Risk Characterization section. The effect of grams of oysters consumed is not as strong on the basis of this analysis compared to the correlation analysis; possibly due to the fact that consumption in excess of two dozen oysters is infrequent (<2%) and therefore extremes of the variability distribution of grams of oysters consumed is not a strong determinant of the total variation of log10 risk per serving. Variation in the length of refrigeration time and ambient air temperature during harvest do not have strong effects on the variation of risk.
With respect to the Pacific Northwest intertidal summer harvest, the fit of a linear regression with nine variability parameters as predictors explained 72% of the overall variation of log10 risk per serving. Based on the relative partial sums of squares sensitivity measure, the percentage pathogenic is a more influential parameter than the level of log10 V. parahaemolyticus/g at time of harvest for this region, season, and harvest type. The sensitivity coefficient for percentage pathogenic was 20% compared to 12% for log10 V. parahaemolyticus/g at time of harvest. The influence of other factors was much less pronounced. Grams of oysters consumed and oyster temperature during intertidal exposure were the next most influential factors with each being associated with approximately 5% of the variation in log10 risk per serving.
For both of these examples of region/season combinations, the regression-based sensitivity analysis was repeated using a quadratic response-surface model to determine the effect of interaction of factors on estimates of importance. Although the quadratic response-surface regression indicated that there are significant interactions between factors in the model, the resulting estimates of variance attributable to the variability parameters did not differ substantially from that estimated based on the linear regression for either region.
Sensitivity Analysis of Uncertainty Parameters
A regression-based sensitivity analysis approach was also applied to the uncertainty parameters in order to compare the results to and validate the estimates of importance of uncertainty parameters obtained by the method of fixing parameter values to nominal levels, one at a time, and calculating conditional variances of the output variable (mean risk per serving), as described in the Risk Characterization section.
Both a linear and a quadratic response surface were considered as approximating regressions with log10 mean risk per serving (over variability factors) as the response variable of the regression. Similar to the results obtained in the analysis of importance of variability factors on individual risk per serving, a log10 transformation of mean risk as the output variable was appropriate and the linear regression approximation was found to be generally sufficient for the purpose of importance assessment. The influence of dose-response uncertainty was assessed in the regression-based approach by using dose-response model parameter uncertainty realizations to calculate the uncertainty of log10 ID01 (the dose level corresponding to a probability of infection of 1%). This was used as a regression predictor, rather than the log10 ID50 or some other summary of the dose-response uncertainty that might be less pertinent. Similarly, mean percentage pathogenic (or the relative abundance of pathogenic strains) was used as a regression predictor since this is the most direct and pertinent summary of the variability distribution of percentage pathogenic. For the effect of year-to-year variations in water temperature, both the mean and the standard deviation of the temperature distribution were used as predictors. Similarly, the effect of uncertainty of prediction of V. parahaemolyticus/g at time of harvest based on water temperature was assessed by using both the mean and the standard deviation of the prediction uncertainty as regression predictors of model simulation output. The results of the regression-based sensitivity analysis of uncertainty parameters for the two examples described above (i.e. , Gulf Coast (Louisiana)/Summer and Pacific Northwest (Intertidal)/Summer) are shown in Table A6-2.
For the Gulf Coast (Louisiana)/ Summer harvest, the fit of a linear regression of log10 mean risk per serving versus seven selected input uncertainty factors explained 97% of the variation of the output variable. Based on the relative partial sums of squares sensitivity measure, the parameter uncertainty of the Beta-Poisson dose-response model is associated with ~78% of the variation in log10 mean risk per serving. The 2nd and 3rd most influential factors were identified as the uncertainty of mean percentage pathogenic and the growth rate uncertainty, which are associated with 8% and 7% of the total variation, respectively. The effect of the other uncertainties were minimal, particularly the variation in the mean and standard deviation of water temperature distributions (i.e. , year-to-year variations of water temperature).
The effect of uncertainty parameters on mean log10 risk per serving for the Pacific Northwest (Intertidal)/Summer harvest was noticeably different than that obtained for the Gulf Coast. An approximating linear regression explained only 80% of the variation in mean log10 risk per serving. With the inclusion of 1st order interaction terms (quadratic regression) the proportion of the variance explained was only marginally higher at 82%. Although the dose-response and the growth rate prediction uncertainties are identified as important, the influence of uncertainty of mean percentage pathogenic was much less substantial in comparison to the results obtained for the Gulf Coast (Louisiana)/Summer harvest. This may be a consequence of the fact that the percentage pathogenic is generally an order of magnitude greater in the Pacific Northwest in comparison to the Gulf Coast and/or the relative effect of other types of uncertainties is more substantial.
|Region||Parameter Uncertainty||Sensitivity Coefficient|
(Louisiana) / Summer
|Dose-response (uncertainty of log10 ID01)||78%|
|Mean percentage pathogenic||8.4%|
|Growth rate in oysters vs. broth culture||7.0%|
|Predicted mean log10 Vibrio parahaemolyticus/g a||1.5%|
|Predicted std dev of log10 Vibrio parahaemolyticus/g b||0.6%|
|Mean of water temperature distribution||0.5%|
|Std Dev of water temperature distribution||0.1%|
|R2 of full model of log10 mean risk per serving||97%|
(Intertidal) ? Summer
|Dose-response (uncertainty of log10 ID01)||31%|
|Std dev of water temperature distribution||15%|
|Growth rate in oysters vs. broth culture||13%|
|Mean percentage pathogenic||3.8%|
|Predicted std dev of log10 Vibrio parahaemolyticus/g b||3.4%|
|Predicted mean log10 Vibrio parahaemolyticus/g a||1.4%|
|Mean of water temperature distribution||0.7%|
|R2 of full model of log10 risk per serving||80%|
a uncertainty of the regression estimate of mean log10 V. parahaemolyticus/g at mean water temperature
b uncertainty of the regression estimate of variation of log10 V. parahaemolyticus/g
The most striking difference between the results obtained for the Pacific Northwest (Intertidal) compared to that obtained for the Gulf Coast (Louisiana) is the apparent importance of year-to-year variations in water temperature for this region/season. Summary statistics of year-to-year variations in water temperature distributions used for model construction indicate greater year-to-year variability in the Pacific Northwest (Intertidal)/Summer compared to the Gulf Coast (Louisiana)/ Summer. Although the differences may not appear substantial, the results of the sensitivity analysis shown in Table A6-2 suggest that small differences in predicted year-to-year variations of temperature distributions across different regions and seasons imply relatively larger variability of risk and/or uncertainty of the number of illnesses that may occur in a given year due to temperature extremes. For the Pacific Northwest (Intertidal), the influence of year-to-year variation in spread of temperature distributions (as measured by the standard deviation of daily water temperatures) is particularly influential with approximately 15% of the variation in mean log10 risk per serving being attributable to this aspect of year-to-year variation of water temperature distributions.
Return to Table of Contents