Balancing statistics and ecology: lumping experimental data for model selection
Nelly van der Hoeven1*, Lia Hemerik2 and Patrick A. Jansen3§
Ecological experiments often accumulate data by carrying out many replicate trials, each containing a limited number of observations, which are then pooled and analysed in the search for a pattern. Replicating trials may be the only way to obtain sufficient data, yet lumping disregards the possibility of differences in experimental conditions influencing the overall pattern. This paper discusses how to deal with this dilemma in model selection. Three methods of model selection are introduced: likelihood-ratio testing, the AIC with or without small-sample correction and the BIC. Subsequently, we apply the AICc method to an example on size-dependent seed dispersal by scatterhoarding rodents.
The example involves binary data on the selection and removal of Carapa procera (Meliaceae) seeds by scattterharding rodents in replicate trials during years of different ambient seed abundance. The question is whether there is an optimum size for seeds to be removed and dispersed by the rodents. We fit five models, varying from no effect of seed mass to an optimum seed mass. We show that lumping the data produces the expected pattern, but gives a poor fit compared to analyses in which grouping levels are taken into account, either by letting the parameters depend on the group, by assuming a random effect of the group on the parameter values, or by assuming some of the parameters fixed for all groups, whereas others depend on the group. Model fitting with some parameters fixed for all groups, and others depending on the trial give the best fit. The general pattern is, however, rather weak.
We explore how far models must differ in order to be able to discriminate between them, using the minimum Kullback-Leibler distance as a measure for the difference. We then show by simulation that the differences are too small to discriminate at all between the five models tested at the level of replicate trials.
We recommend a combined approach in which the level of lumping trials is chosen by the amount of variation explained in comparison to an analysis at the trial level. It is shown that combining data from different trials only leads to an increase in the probability of identifying the correct model with the AIC criterion if the distance of all simpler (=less extended models) to the simulated model is sufficiently large in each trial. Otherwise, increasing the number of replicate trials might even lead to a decrease in the power of the AIC.
Key words: AIC; Carapa procera; Kullback-Leibler distance; Likelihood-Ratio test; model selection; Myoprocta acouchy; noncentral chi-square distribution; power; Red acouchy; scatterhoarding; seed dispersal; seed size
In: T.A.C. Reydon & L. Hemerik (Eds): Current themes in Theoretical Biology: A Dutch Perspective. pp 233-265. Springer, Dordrecht, The Netherlands, 2004.