multimodel inference and model averaging

Model selection

When we have multiple competing models (corresponding to various hypotheses), we need some techniques to compare the performance of the models, then select the best model(s) to support specific hypothesis.

Multimodel inference and model averaging

The datasets we got are always complicated and noisy. We seldom have enough evidences to select just one single best model. Within the framework of multimodel inference, the models can be ranked by a measure of its relative support (weight, the relative likelihood of the model given the data). When there exists some uncertainty in model selection, like the potential models have similar levels of support, then model averaging can be used for parameter estimates or predictions to minimize the uncertainty^[1-4].

Example in R

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 # Four virtual variables, one dependent variable (y), # three independent variables (x1, x2, x3) set.seed(1234) x1 <- rnorm(50,mean=5,sd=3) x2 <- rnorm(50,mean=10,sd=2) x3 <- rnorm(50,mean=15,sd=3) y <- rnorm(50,mean=20,sd=5) # I use linear regression without interactions # and polynomials here as an example lm1 <- lm(y~x1+x2+x3) summary(lm1) ## Call: ## lm(formula = y ~ x1 + x2 + x3) ## ## Residuals: ## Min 1Q Median 3Q Max ## -12.7587 -3.8724 -0.2943 4.1334 12.1460 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.42305 6.08497 5.000 8.8e-06 *** ## x1 0.05949 0.32110 0.185 0.854 ## x2 -0.20667 0.41578 -0.497 0.622 ## x3 -0.54467 0.33476 -1.627 0.111 ## --- ## Signif. codes: ## 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## ## Residual standard error: 5.916 on 46 degrees of freedom ## Multiple R-squared: 0.06738, Adjusted R-squared: 0.006561 ## F-statistic: 1.108 on 3 and 46 DF, p-value: 0.3556 # Load package "MuMIn" (Version 1.15.6) for multimodel # inference and model averaging library(MuMIn) options(na.action="na.fail") # the default argument of "na.action" # is "na.omit" which is not proper here # Generate a set of models with combinations of terms in the global model. # "m.lim", "subset", "fixed" (the term to be included in all models, # the default value is the intercept) can constrain the resulting set # of models, but think carefully before using them dd <- dredge( lm1, # the global model, supports many types beta="none", # standardize the estimates or not evaluate=TRUE, # evaluate and rank the models rank="AICc", # rank the models with AICc for small sample size m.lim=c(0,3), # limits for the number of terms in a single model, # the old arguments are "m.min" and "m.max" subset= x1 | x2 | x3, # logical expression describing models # to keep in the resulting set extra="R^2" # function for additional statistics # to include in the result ) dd ## Global model call: lm(formula = y ~ x1 + x2 + x3) ## --- ## Model selection table ## (Intrc) x1 x2 x3 R^2 df logLik AICc delta weight ## 5 28.88 -0.5686 6.133e-02 3 -157.907 322.3 0.00 0.424 ## 7 30.60 -0.2131 -0.5375 6.669e-02 4 -157.764 324.4 2.08 0.150 ## 6 28.73 0.072750 -0.5763 6.237e-02 4 -157.879 324.6 2.31 0.134 ## 3 23.76 -0.3351 1.371e-02 3 -159.144 324.8 2.47 0.123 ## 2 20.26 0.015250 4.654e-05 3 -159.488 325.5 3.16 0.087 ## 8 30.42 0.059490 -0.2067 -0.5447 6.738e-02 5 -157.745 326.9 4.52 0.044 ## 4 23.76 -0.001132 -0.3351 1.371e-02 4 -159.144 327.2 4.84 0.038 ## Models ranked by AICc(x) # Model averaging avg <- model.avg( dd, # I use the "model.selection" object subset=cumsum(weight)<=0.95 # model averaging on the subset of the # resulting models, I use a 95% confidence # set of the models (the cumulative # weight is nearly equal to 0.95), # 5 models will be used here ) summary(avg) ## Call: ## model.avg(object = dd, subset = cumsum(weight) <= 0.95) ## ## Component model call: ## lm(formula = y ~ <5 unique rhs>) ## ## Component models: ## df logLik AICc delta weight ## 3 3 -157.91 322.34 0.00 0.46 ## 23 4 -157.76 324.42 2.08 0.16 ## 13 4 -157.88 324.65 2.31 0.15 ## 2 3 -159.14 324.81 2.47 0.13 ## 1 3 -159.49 325.50 3.16 0.10 ## ## Term codes: ## x1 x2 x3 ## 1 2 3 ## ## Model-averaged coefficients: ## (full average) ## Estimate Std. Error Adjusted SE z value Pr(>|z|) ## (Intercept) 27.62906 5.71991 5.82612 4.742 2.1e-06 *** ## x3 -0.43432 0.37016 0.37588 1.155 0.248 ## x2 -0.07972 0.25715 0.26225 0.304 0.761 ## x1 0.01203 0.15875 0.16281 0.074 0.941 ## ## (conditional average) ## Estimate Std. Error Adjusted SE z value Pr(>|z|) ## (Intercept) 27.62906 5.71991 5.82612 4.742 2.1e-06 *** ## x3 -0.56346 0.32402 0.33246 1.695 0.0901 . ## x2 -0.26809 0.41457 0.42519 0.631 0.5284 ## x1 0.05002 0.32073 0.32907 0.152 0.8792 ## --- ## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## ## Relative variable importance: ## x3 x2 x1 ## Importance: 0.77 0.30 0.24 ## N containing models: 3 2 2 # Confidence intervals for model parameters confint(avg) ## 2.5 % 97.5 % ## (Intercept) 16.2100663 39.04804920 ## x3 -1.2150750 0.08814727 ## x2 -1.1014369 0.56526046 ## x1 -0.5949329 0.69497959

“Let the computer find out” is a poor strategy and usually reflects the fact that the researcher did not bother to think clearly about the problem of interest and its scientific setting^[3].

So, you must select the potential models, judge the parameter estimates and the predictions with knowledge of your domain through the entire analysis process.

References:

[1]: Johnson, J. B. and K. S. Omland. 2004. Model selection in ecology and evolution. Trends in Ecology & Evolution 19:101-108.

[2]: Grueber, C. E., S. Nakagawa, R. J. Laws, and I. G. Jamieson. 2011. Multimodel inference in ecology and evolution: challenges and solutions. Journal of Evolutionary Biology 24:699-711.

[3]: Burnham KP and Anderson DR. 2002. Model selection and multi-model inference: a practical information-theoretic approach. Springer.

[4]: Burnham, K., D. Anderson, and K. Huyvaert. 2011. AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology 65:23-35.

multimodel inference and model averaging

Model selection

Multimodel inference and model averaging

Example in R

References:

近期文章

近期评论

标签

热门

文章归档

分类目录

功能

Model selection

Multimodel inference and model averaging

Example in R

References:

Related Posts

近期文章

近期评论

标签

热门

文章归档

分类目录

功能