Balancing statistics and ecology: lumping experimental data for model selection
Nelly van der Hoeven1*, Lia Hemerik2 and Patrick A. Jansen3§
*: Present address: ECOSTAT, Vondellaan 23, 2332 AA Leiden, The Netherlands
Leiden University, IEES, Department of Theoretical Evolutionary Biology, P.O. Box 9516, 2300 RA Leiden, The Netherlands
- Biometris, Department of mathematical and statistical methods, Wageningen University, P.O.box 100, 6700 AC Wageningen, The Netherlands
- Wageningen University, Forest Ecology and Forest Management group, P.O.Box 342, 6700 AH Wageningen, The Netherlands
§: Present address: Alterra - Wageningen UR, Centre for Ecosystem Studies, P.O. Box 74, 6700 AA Wageningen, The Netherlands.
Ecological experiments often accumulate data by carrying out many replicate trials, each containing a limited number of observations, which are then pooled and analysed in the search for a pattern. Replicating trials may be the only way to obtain sufficient data, yet lumping disregards the possibility of differences in experimental conditions influencing the overall pattern. This paper discusses how to deal with this dilemma in model selection. Three methods of model selection are introduced: likelihood-ratio testing, the AIC with or without small-sample correction and the BIC. Subsequently, we apply the AICc method to an example on size-dependent seed dispersal by scatterhoarding rodents.
The example involves binary data on the selection and removal of Carapa procera (Meliaceae) seeds by scattterharding rodents in replicate trials during years of different ambient seed abundance. The question is whether there is an optimum size for seeds to be removed and dispersed by the rodents. We fit five models, varying from no effect of seed mass to an optimum seed mass. We show that lumping the data produces the expected pattern, but gives a poor fit compared to analyses in which grouping levels are taken into account, either by letting the parameters depend on the group, by assuming a random effect of the group on the parameter values, or by assuming some of the parameters fixed for all groups, whereas others depend on the group. Model fitting with some parameters fixed for all groups, and others depending on the trial give the best fit. The general pattern is, however, rather weak.
We explore how far models must differ in order to be able to discriminate between them, using the minimum Kullback-Leibler distance as a measure for the difference. We then show by simulation that the differences are too small to discriminate at all between the five models tested at the level of replicate trials.
We recommend a combined approach in which the level of lumping trials is chosen by the amount of variation explained in comparison to an analysis at the trial level. It is shown that combining data from different trials only leads to an increase in the probability of identifying the correct model with the AIC criterion if the distance of all simpler (=less extended models) to the simulated model is sufficiently large in each trial. Otherwise, increasing the number of replicate trials might even lead to a decrease in the power of the AIC.
Key words: AIC; Carapa procera; Kullback-Leibler distance; Likelihood-Ratio test; model selection; Myoprocta acouchy; noncentral chi-square distribution; power; Red acouchy; scatterhoarding; seed dispersal; seed size
In: T.A.C. Reydon & L. Hemerik (Eds): Current themes in Theoretical Biology: A Dutch Perspective. pp 233-265. Springer, Dordrecht, The Netherlands, 2004.
Mathematical models for biological processes
ECOSTAT develops, tests and evaluate mathematical and computer simulation models for biological processes.
ECOSTAT offers to develop models for the biological processes you are investigating. These
models will, of course, always be developed in close co-operation.
ECOSTAT can also estimate the parameters of these models.
For complex models, ECOSTAT recommends to perform a sensitivity analysis of the model results. ECOSTAT can assist you with such an analysis.
If you have a model and wish for a general critical evaluation ("second opinion"), ECOSTAT can offer it.
Some examples of ECOSTAT's experience
- A method to estimate the effect of a sediment on bioluminescence in MSP tests with marine sediments. The effect was classified in two types, binding to the sediment and the effect of chemicals in the sediment on the bioluminescence. A method was developed to estimate the second effect.
- A critical evaluation of a probabilistic model to translate laboratory data on toxic chemicals to environmental standards for soils and sediments.
- Four models were developed to describe the effect of seed weight on the probability that a seed of the Carapa tree was taken away and concealed by agouties. It was investigated which relationship between seed weight and that probability best described the observed distribution of seeds taken and concealed. It was also investigated whether the decision scheme of the agouties depended on the total harvest of seeds. This research is described in van der Hoeven et al., 2004.