Seed dispersal analysis

Intro

Goal: to simulate some seed dispersal data and try to fit it via inverse modeling. The set-up here is a little bit different from the typical case where we have a single plot with multiple adults (seed sources) and multiple seed traps (targets); in this case, we have one target per plot. Thus if we consider our data set as a whole we have to make sure that sources in different plots are not allowed to contribute to the seed rain in a focal plot. Preliminaries: load packages (SparseM is needed for some unnecessarily clever stuff incorporated below for efficiency), set random-number generator seed for reproducibility:

> require(SparseM)
Package SparseM (0.78) loaded.  To cite, see citation("SparseM")
> require(bbmle)
> set.seed(1001)

Simulation

Define parameters. The model for seed rain at target in plot $j$ is

(1)
\begin{split} f_{ij} & = S_{ij}^{\beta} \\ E_j & = \sum \gamma f_{ij} \exp{-d_{ij}/\alpha} \\ O_j & \sim \mbox{NegBinom}(E_j,k) \end{split}

where $f_{ij}$ is the (scaled) fecundity of source $i$ in plot $j$; $S_{ij}$ is the size (dbh?) of source $i$; and $d_{ij}$ is the distance of source $i$ from the target in plot $j$. Thus $\beta$ controls the dependence of fecundity on size, $\alpha$ is the characteristic scale of dispersal, $\gamma$ controls overall seed density, and $k$ describes variability. We will fit $\gamma$ and $k$ on the log scale for convenience.

> nplots <- 100
> plotsize <- 50
> trueparms <- c(alpha = 5, beta = 2, loggamma = log(1e-04), logk = log(2))
> ## locations of adults within plots
> ## mean DBH is 32 (in some arbitrary units)
> ## combine data on source trees

Simulate target locations:
> target.x <- runif(nplots, max = plotsize)
> target.y <- runif(nplots, max = plotsize)

Calculate distances among all sources and all targets, then set those that are not in the same plot to infinity (we will do something slightly more complicated later on).
> dist <- sqrt(outer(sourcedat$x, target.x, "-")^2 + outer(sourcedat$y,
+     target.y, "-")^2)
> sameplot <- outer(sourcedat$plot, 1:nplots, "==") > dist[!sameplot] <- Inf The distance matrix is block-diagonal: > image(dist, col = gray((0:50)/50)) The function to compute the expected seed shadow essentially replicates the equation above; the summation is done by a matrix multiplication, which is clever but potentially very inefficient because of the plot separation; most sources do not contribute to most of the targets. There are two ways to fix this inefficiency, one straightforward and the other clever (guess which one I used??). The straightforward way would be to split the source data by plot, compute a separate source-to-target distance matrix for each plot, and do the (fecundity$\timesf(\mbox{distance})$) calculation separately for each plot. The clever way is to use an R package that supports sparse matrices, and hence automatically ignores all the out-of-plot interactions. However, in order to do this, we have to hack things a bit. First, we have to tell R how to exponentiate just the non-zero elements of a sparse matrix: this requires hacking into the internal structure of the matrix a bit. Second, this approach is mathematically wrong (but works) if we set the out-of-plot element distances to zero - they should really be infinite (and then reduced to zero by the seed-shadow function), but in this case "two wrongs make a right". As long as there are no true zero distances in the data set, and as long as the out-of-plot seed shadow contributions are always zero, this approach gives us the right answer. The expected-seed function: > expseeds <- function(alpha, beta, gamma, size, dist) { + fec <- size^beta + shadow <- myexp(dist, -1/alpha) + gamma * (fec %*% shadow) + } A little bit of nasty hacking to allow us to exponentiate a sparse matrix element-wise (i.e., exponentiate all of the non-zero elements) > myexp <- function(x,a) { + if (inherits(x,"matrix.csr")) { + x@ra <- exp(a*x@ra) ## Sparse matrix HACK + } else if (is.matrix(x)) { + x <- exp(a*x) + } else stop("x should be a matrix") + return(x) + } Simulate the "data": > z <- with(as.list(trueparms), expseeds(alpha, beta, exp(loggamma), + size = sourcedat$size, dist = dist))
> obs <- rnbinom(nplots, mu = z, size = 2)
> tdat <- data.frame(x = target.x, y = target.y, seeds = obs)

Fitting

Now forget we know about anything but tdat and sourcedat and try to estimate alpha, beta, gamma.

> nplots <- nrow(tdat)
> dist <- sqrt(outer(sourcedat$x, tdat$x, "-")^2 + outer(sourcedat$y, + tdat$y, "-")^2)
> sameplot <- outer(sourcedat$plot, 1:nplots, "==") Set the out-of-plot distances to infinity or zero (depending on whether we are using the sparse matrix trick or not): > sparse <- TRUE > if (!sparse) { + dist[!sameplot] <- Inf + } else { + dist[!sameplot] <- 0 + dist <- as.matrix.csr(dist) + } Set up the negative log-likelihood function, which just computes the expected number of seeds and then feeds it into the negative binomial density calculation: > nllfun <- function(alpha, beta, loggamma, logk) { + e <- expseeds(alpha, beta, exp(loggamma), sourcedat$size,
+         dist)
+     -sum(dnbinom(tdat\$seeds, mu = e, size = exp(logk), log = TRUE))
+ }

Fit. I've set some (large) bounds on some of these parameters - as it turns out,
> library(bbmle)
> (m1 <- mle2(nllfun, start = list(alpha = 1, beta = 1, loggamma = -3,
+     logk = 0), method = "L-BFGS-B", lower = c(alpha = 0, beta = 0,
+     loggamma = -20, logk = -20), upper = c(alpha = 60, beta = Inf,
+     loggamma = Inf, logk = 50)))
Call:
mle2(minuslogl = nllfun, start = list(alpha = 1, beta = 1, loggamma = -3,
logk = 0), method = "L-BFGS-B", lower = c(alpha = 0, beta = 0,
loggamma = -20, logk = -20), upper = c(alpha = 60, beta = Inf,
loggamma = Inf, logk = 50))

Coefficients:
alpha      beta  loggamma      logk
3.011964  0.981971 -4.535761 10.335389

Log-likelihood: -41.19

How did we do compared to the true values?
> round(cbind(coef(m1), confint(m1, method = "quad"), trueparms),
+     3)
2.5 %   97.5 % trueparms
alpha     3.012     0.666    5.358     5.000
beta      0.982    -1.051    3.015     2.000
loggamma -4.536   -12.301    3.229    -9.210
logk     10.335 -1124.585 1145.256     0.693

The confidence limits above are quadratic approximations. How do the profile confidence limits look?
> if (file.exists("seeddisp-prof.RData")) {
+ } else {
+     t1 <- system.time(pp <- profile(m1, trace = TRUE))
+     save("pp", "t1", file = "seeddisp-prof.RData")
+ }

(This takes a little while - 1.7 minutes - so store the results.)
> plot(pp) > round(cbind(coef(m1), confint(pp), trueparms), 3)
2.5 % 97.5 % trueparms
alpha     3.012  1.810     NA     5.000
beta      0.982     NA     NA     2.000
loggamma -4.536     NA -0.418    -9.210
logk     10.335 -1.869     NA     0.693

The unfortunate conclusion in this case is that even with 100 plots, we don't have very much resolution of the parameters. It would help a bit if we had more contrast in the data …

(You can see the original Sweave file if you like.)

To do: reference Ribbens etc.; discuss simulated annealing etc. Power calculations???

page revision: 12, last edited: 30 Jan 2009 17:58