The probability to select the correct model using likelihood-ratio based criteria in choosing between two nested models of which the more extended one is true.

Nelly van der Hoeven*

ECOSTAT, Vondellaan 23, 2332 AA, Leiden, The Netherlands
Corresponding Author:

Nelly van der Hoeven,
ECOSTAT,
Vondellaan 23,
2332 AA Leiden,
The Netherlands
tel.: (+)31-(0)71-5315011; fax: (+)31-(0)842-116988;
email: NvdH @iecostat.nl

Summary

The probability to select the correct model is calculated for likelihood-ratio based criteria to compare two nested models. If the more extended of the two models is true, the difference between twice the maximised log-likelihoods is approximately noncentral chi-square distributed with d.f. the difference in the number of parameters. The noncentrality parameter of this noncentral chi-square distribution can be approximated by twice the minimum Kullback-Leibler divergence (MKLD) of the best fitting simple model to the true version of the extended model.
The MKLD, and therefore the probability to select the correct model increases approximately proportionally to the number of observations if all observations are performed under the same conditions. If a new set of observations can only be performed under different conditions, the model parameters may depend on the conditions, and therefore have to be estimated for each set of observations separately. An increase in observations will then go together with an increase in the number of model parameters. In this case, the power of the Likelihood-Ratio test will increase with an increasing number of observations. However, the probability to choose the correct model with the AIC will only increase if for each set of observations the MKLD is more than 0.5. If the MKLD is less than 0.5, that probability will decrease. The probability to choose the correct model with the BIC will always decrease, sometimes after an initial increase for a small number of observation sets. The results are illustrated by a simulation study with a set of five nested non-linear models for binary data.

keywords: AIC; BIC; Kullback-Leibler divergence; noncentral Chi-square; power


logo ECOSTAT J. Statistical Planning and Inference 135: 477-486, 2005