Seed dispersal/inverse modeling example
Intro
Goal: to simulate some seed dispersal data and try to fit it via inverse modeling. The set-up here is a little bit different from the typical case where we have a single plot with multiple adults (seed sources) and multiple seed traps (targets); in this case, we have one target per plot. Thus if we consider our data set as a whole we have to make sure that sources in different plots are not allowed to contribute to the seed rain in a focal plot. Preliminaries: load packages (SparseM is needed for some unnecessarily clever stuff incorporated below for efficiency), set random-number generator seed for reproducibility:
> require(SparseM)
Package SparseM (0.78) loaded. To cite, see citation("SparseM")
> require(bbmle)
> set.seed(1001)
Simulation
Define parameters. The model for seed rain at target in plot
is

where
is the (scaled) fecundity of source
in plot
;
is the size (dbh?) of source
; and
is the distance of source
from the target in plot
. Thus
controls the dependence of fecundity on size,
is the characteristic scale of dispersal,
controls overall seed density, and
describes variability. We will fit
and
on the log scale for convenience.
> nplots <- 100
> plotsize <- 50
> adults.per.plot <- 20
> adult.meansize <- 32
> adult.sizeshape <- 4
> trueparms <- c(alpha = 5, beta = 2, loggamma = log(1e-04), logk = log(2))
> adults <- rpois(nplots,adults.per.plot)
> ## locations of adults within plots
> adult.x <- runif(sum(adults),max=plotsize)
> adult.y <- runif(sum(adults),max=plotsize)
> adult.n <- unlist(lapply(adults,seq,from=1))
> ## mean DBH is 32 (in some arbitrary units)
> adult.size <- rgamma(sum(adults),shape=adult.meansize/adult.sizeshape,
+ scale=adult.sizeshape)
> plot.id <- rep(1:nplots,adults)
> ## combine data on source trees
> sourcedat <- data.frame(plot=plot.id,n=adult.n,x=adult.x,y=adult.y,
+ size=adult.size)
Simulate target locations:
> target.x <- runif(nplots, max = plotsize)
> target.y <- runif(nplots, max = plotsize)
Calculate distances among all sources and all targets, then set those that are not in the same plot to infinity (we will do something slightly more complicated later on).
> dist <- sqrt(outer(sourcedat$x, target.x, "-")^2 + outer(sourcedat$y,
+ target.y, "-")^2)
> sameplot <- outer(sourcedat$plot, 1:nplots, "==")
> dist[!sameplot] <- Inf
The distance matrix is block-diagonal:
> image(dist, col = gray((0:50)/50))

The function to compute the expected seed shadow essentially replicates the equation above; the summation is done by a matrix multiplication, which is clever but potentially very inefficient because of the plot separation; most sources do not contribute to most of the targets. There are two ways to fix this inefficiency, one straightforward and the other clever (guess which one I used??). The straightforward way would be to split the source data by plot, compute a separate source-to-target distance matrix for each plot, and do the (fecundity
) calculation separately for each plot.
The clever way is to use an R package that supports sparse matrices, and hence automatically ignores all the out-of-plot interactions. However, in order to do this, we have to hack things a bit. First, we have to tell R how to exponentiate just the non-zero elements of a sparse matrix: this requires hacking into the internal structure of the matrix a bit. Second, this approach is mathematically wrong (but works) if we set the out-of-plot element distances to zero - they should really be infinite (and then reduced to zero by the seed-shadow function), but in this case "two wrongs make a right". As long as there are no true zero distances in the data set, and as long as the out-of-plot seed shadow contributions are always zero, this approach gives us the right answer. The expected-seed function:
> expseeds <- function(alpha, beta, gamma, size, dist) {
+ fec <- size^beta
+ shadow <- myexp(dist, -1/alpha)
+ gamma * (fec %*% shadow)
+ }
A little bit of nasty hacking to allow us to exponentiate a sparse matrix element-wise (i.e., exponentiate all of the non-zero elements)
> myexp <- function(x,a) {
+ if (inherits(x,"matrix.csr")) {
+ x@ra <- exp(a*x@ra) ## Sparse matrix HACK
+ } else if (is.matrix(x)) {
+ x <- exp(a*x)
+ } else stop("x should be a matrix")
+ return(x)
+ }
Simulate the "data":
> z <- with(as.list(trueparms), expseeds(alpha, beta, exp(loggamma),
+ size = sourcedat$size, dist = dist))
> obs <- rnbinom(nplots, mu = z, size = 2)
> tdat <- data.frame(x = target.x, y = target.y, seeds = obs)
Fitting
Now forget we know about anything but tdat and sourcedat and try to estimate alpha, beta, gamma.
> nplots <- nrow(tdat)
> dist <- sqrt(outer(sourcedat$x, tdat$x, "-")^2 + outer(sourcedat$y,
+ tdat$y, "-")^2)
> sameplot <- outer(sourcedat$plot, 1:nplots, "==")
Set the out-of-plot distances to infinity or zero (depending on whether we are using the sparse matrix trick or not):
> sparse <- TRUE
> if (!sparse) {
+ dist[!sameplot] <- Inf
+ } else {
+ dist[!sameplot] <- 0
+ dist <- as.matrix.csr(dist)
+ }
Set up the negative log-likelihood function, which just computes the expected number of seeds and then feeds it into the negative binomial density calculation:
> nllfun <- function(alpha, beta, loggamma, logk) {
+ e <- expseeds(alpha, beta, exp(loggamma), sourcedat$size,
+ dist)
+ -sum(dnbinom(tdat$seeds, mu = e, size = exp(logk), log = TRUE))
+ }
Fit. I've set some (large) bounds on some of these parameters - as it turns out,
> library(bbmle)
> (m1 <- mle2(nllfun, start = list(alpha = 1, beta = 1, loggamma = -3,
+ logk = 0), method = "L-BFGS-B", lower = c(alpha = 0, beta = 0,
+ loggamma = -20, logk = -20), upper = c(alpha = 60, beta = Inf,
+ loggamma = Inf, logk = 50)))
Call:
mle2(minuslogl = nllfun, start = list(alpha = 1, beta = 1, loggamma = -3,
logk = 0), method = "L-BFGS-B", lower = c(alpha = 0, beta = 0,
loggamma = -20, logk = -20), upper = c(alpha = 60, beta = Inf,
loggamma = Inf, logk = 50))
Coefficients:
alpha beta loggamma logk
3.011964 0.981971 -4.535761 10.335389
Log-likelihood: -41.19
How did we do compared to the true values?
> round(cbind(coef(m1), confint(m1, method = "quad"), trueparms),
+ 3)
2.5 % 97.5 % trueparms
alpha 3.012 0.666 5.358 5.000
beta 0.982 -1.051 3.015 2.000
loggamma -4.536 -12.301 3.229 -9.210
logk 10.335 -1124.585 1145.256 0.693
The confidence limits above are quadratic approximations. How do the profile confidence limits look?
> if (file.exists("seeddisp-prof.RData")) {
+ load("seeddisp-prof.RData")
+ } else {
+ t1 <- system.time(pp <- profile(m1, trace = TRUE))
+ save("pp", "t1", file = "seeddisp-prof.RData")
+ }
(This takes a little while - 1.7 minutes - so store the results.)
> plot(pp)
> round(cbind(coef(m1), confint(pp), trueparms), 3)
2.5 % 97.5 % trueparms
alpha 3.012 1.810 NA 5.000
beta 0.982 NA NA 2.000
loggamma -4.536 NA -0.418 -9.210
logk 10.335 -1.869 NA 0.693
The unfortunate conclusion in this case is that even with 100 plots, we don't have very much resolution of the parameters. It would help a bit if we had more contrast in the data …
(You can see the original Sweave file if you like.)
To do: reference Ribbens etc.; discuss simulated annealing etc. Power calculations???





