AIC vs BIC

I've been stumped for quite a while trying to decide what the criteria really are for when one should use AIC vs BIC. Burnham and Anderson talk about it quite a bit, but they are such staunch AIC partisans that it took me a while to come around to their point of view. The main reason that I would have preferred BIC is that, if you look at the derivations, BIC approximates the log of the marginal likelihood for a large dataset with an uninformative prior, while AIC approximates the same thing — but with a very strong prior (see p. 212-213 of the book, or Kass and Raftery 1995). From this point of view, the BIC seems more sensible.

On the other hand, B&A make a compelling argument that BIC was developed to identify the "dimension" or "true number of parameters" of a model, and that this is rarely sensible in ecological modeling contexts because of what B&A call tapering effects. That is, if you have a large number of predictor variables some of which have non-zero effect sizes (e.g. regression coefficients) and some of which have zero coefficients, the BIC is trying to tell you how many are non-zero. B&A point out that it is more common that there will really be a few coefficients with large magnitude, more with smaller magnitude, even more with tiny magnitudes … and that all of the predictor variables really have some effect on the response, albeit very small. What we should be trying to do, they say, is identify how many parameters are useful for prediction rather than how many are non-zero. (This also agrees in general with the Bayesian argument against point null hypotheses, i.e. that parameters are never exactly zero — somewhat ironic, since it suggests that Bayesians would prefer AIC over the "Bayesian" Information Criterion.)

The bottom line: I would say the AIC is generally the right choice for ecological questions, over BIC, unless you're really trying to identify a specific number of components. (People do this in time-series analysis, to try to identify the number of time lags or interacting species, although I think they probably shouldn't — the "tapering effects" argument really applies here.)

testing comments
bbolkerbbolker 1218564331|%e %b %Y, %H:%M %Z|agohover

just a test.

unfold testing comments by bbolkerbbolker, 1218564331|%e %b %Y, %H:%M %Z|agohover
And the AIC?
Anonymous (128.40.24.x) 1221575401|%e %b %Y, %H:%M %Z|agohover

The argument against the BIC above certainly makes sense but I don't know of any theory that tells us that the AIC is any better in that respect. It tries to find the number of nonzeroes as well, though in a more liberal and upward biased way… I don't know whether that works well as a justification of using the AIC…

If you want a theory that, instead of estimating an unknown and essentially unobservable underlying truth, gives you what you find "interesting" or "useful", you have to formalise your concept of "usefulness" as a new criterion. Particularly, don't expect to get anything objective or standard from it, because usefulness generally is not objective.

Christian Hennig

unfold And the AIC? by Anonymous (128.40.24.x), 1221575401|%e %b %Y, %H:%M %Z|agohover
AIC revisited
bbolkerbbolker 1224275456|%e %b %Y, %H:%M %Z|agohover

The BIC is consistent and was designed to identify the "true" dimensionality of an underlying model. The AIC is not consistent but has lower error: "if the number of models of the same dimension does not grow very fast in dimension, the average squared error of the model selected by AIC is asymptotically equivalent to the minimum offered by the candidate models … There has been a debate between AIC and BIC in the literature, centring on the issue of whether the true model is finite-dimensional or infinite-dimensional. There seems to be a consensus that, for the former case, BIC should be preferred, and AIC should be chosen for the latter" (Yang 2005). Furthermore, Yang 2005 shows (apparently: I haven't tried to follow the technical details) that you can't have your cake and eat it too — you have to make a decision between consistency and minimizing prediction errors.

Given that consensus I would say it usually makes more sense to think of infinite-dimensional models (or at least much higher dimension than the most complex of the models we try to fit) as being the default case for ecology, and therefore for AIC being preferred.

Yang, Yuhong. 2005. Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation. Biometrika 92, no. 4 (December 1): 937-950. doi:10.1093/biomet/92.4.937.

unfold AIC revisited by bbolkerbbolker, 1224275456|%e %b %Y, %H:%M %Z|agohover
notes on aic v. bic in sem
jebyrnes (guest) 1228354515|%e %b %Y, %H:%M %Z|agohover

I've been working with structural equations a lot recently, and have noted large discrepancies quite often between the aic and bic values. Interestingly, the bic often selects for models that actually do not fit the data, but happen to be simpler - particularly when the effect sizes of many coefficients are small. It's made me wary of its usage. The tapering effects argument makes a lot of sense, particularly when applied to ecological data here.

unfold notes on aic v. bic in sem by jebyrnes (guest), 1228354515|%e %b %Y, %H:%M %Z|agohover
Add a new comment
page_revision: 8, last_edited: 1218745793|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License