Seed dispersal analysis

Seed dispersal/inverse modeling example

Intro

Goal: to simulate some seed dispersal data and try to fit it via inverse modeling. The set-up here is a little bit different from the typical case where we have a single plot with multiple adults (seed sources) and multiple seed traps (targets); in this case, we have one target per plot. Thus if we consider our data set as a whole we have to make sure that sources in different plots are not allowed to contribute to the seed rain in a focal plot. Preliminaries: load packages (SparseM is needed for some unnecessarily clever stuff incorporated below for efficiency), set random-number generator seed for reproducibility:

> require(SparseM)
Package SparseM (0.78) loaded.  To cite, see citation("SparseM")
> require(bbmle)
> set.seed(1001)

Simulation

Define parameters. The model for seed rain at target in plot $j$ is

(1)
\begin{split} f_{ij} & = S_{ij}^{\beta} \\ E_j & = \sum \gamma f_{ij} \exp{-d_{ij}/\alpha} \\ O_j & \sim \mbox{NegBinom}(E_j,k) \end{split}

where $f_{ij}$ is the (scaled) fecundity of source $i$ in plot $j$; $S_{ij}$ is the size (dbh?) of source $i$; and $d_{ij}$ is the distance of source $i$ from the target in plot $j$. Thus $\beta$ controls the dependence of fecundity on size, $\alpha$ is the characteristic scale of dispersal, $\gamma$ controls overall seed density, and $k$ describes variability. We will fit $\gamma$ and $k$ on the log scale for convenience.

> nplots <- 100
> plotsize <- 50
> adults.per.plot <- 20
> adult.meansize <- 32
> adult.sizeshape <- 4
> trueparms <- c(alpha = 5, beta = 2, loggamma = log(1e-04), logk = log(2))
> adults <- rpois(nplots,adults.per.plot)
> ## locations of adults within plots
> adult.x <- runif(sum(adults),max=plotsize)
> adult.y <- runif(sum(adults),max=plotsize)
> adult.n <- unlist(lapply(adults,seq,from=1))
> ## mean DBH is 32 (in some arbitrary units)
> adult.size <- rgamma(sum(adults),shape=adult.meansize/adult.sizeshape,
+                      scale=adult.sizeshape)
> plot.id <- rep(1:nplots,adults)
> ## combine data on source trees
> sourcedat <- data.frame(plot=plot.id,n=adult.n,x=adult.x,y=adult.y,
+                         size=adult.size)

Simulate target locations:
> target.x <- runif(nplots, max = plotsize)
> target.y <- runif(nplots, max = plotsize)

Calculate distances among all sources and all targets, then set those that are not in the same plot to infinity (we will do something slightly more complicated later on).
> dist <- sqrt(outer(sourcedat$x, target.x, "-")^2 + outer(sourcedat$y, 
+     target.y, "-")^2)
> sameplot <- outer(sourcedat$plot, 1:nplots, "==")
> dist[!sameplot] <- Inf

The distance matrix is block-diagonal:
> image(dist, col = gray((0:50)/50))

seeddisp-distplot.png
The function to compute the expected seed shadow essentially replicates the equation above; the summation is done by a matrix multiplication, which is clever but potentially very inefficient because of the plot separation; most sources do not contribute to most of the targets. There are two ways to fix this inefficiency, one straightforward and the other clever (guess which one I used??). The straightforward way would be to split the source data by plot, compute a separate source-to-target distance matrix for each plot, and do the (fecundity $\times$ $f(\mbox{distance})$) calculation separately for each plot.

The clever way is to use an R package that supports sparse matrices, and hence automatically ignores all the out-of-plot interactions. However, in order to do this, we have to hack things a bit. First, we have to tell R how to exponentiate just the non-zero elements of a sparse matrix: this requires hacking into the internal structure of the matrix a bit. Second, this approach is mathematically wrong (but works) if we set the out-of-plot element distances to zero - they should really be infinite (and then reduced to zero by the seed-shadow function), but in this case "two wrongs make a right". As long as there are no true zero distances in the data set, and as long as the out-of-plot seed shadow contributions are always zero, this approach gives us the right answer. The expected-seed function:

> expseeds <- function(alpha, beta, gamma, size, dist) {
+     fec <- size^beta
+     shadow <- myexp(dist, -1/alpha)
+     gamma * (fec %*% shadow)
+ }

A little bit of nasty hacking to allow us to exponentiate a sparse matrix element-wise (i.e., exponentiate all of the non-zero elements)
> myexp <- function(x,a) {
+   if (inherits(x,"matrix.csr")) {
+     x@ra <- exp(a*x@ra) ## Sparse matrix HACK
+   } else if (is.matrix(x)) {
+     x <- exp(a*x)
+   } else stop("x should be a matrix")
+   return(x)
+ }

Simulate the "data":
> z <- with(as.list(trueparms), expseeds(alpha, beta, exp(loggamma), 
+     size = sourcedat$size, dist = dist))
> obs <- rnbinom(nplots, mu = z, size = 2)
> tdat <- data.frame(x = target.x, y = target.y, seeds = obs)

Fitting

Now forget we know about anything but tdat and sourcedat and try to estimate alpha, beta, gamma.

> nplots <- nrow(tdat)
> dist <- sqrt(outer(sourcedat$x, tdat$x, "-")^2 + outer(sourcedat$y, 
+     tdat$y, "-")^2)
> sameplot <- outer(sourcedat$plot, 1:nplots, "==")

Set the out-of-plot distances to infinity or zero (depending on whether we are using the sparse matrix trick or not):
> sparse <- TRUE
> if (!sparse) {
+     dist[!sameplot] <- Inf
+ } else {
+     dist[!sameplot] <- 0
+     dist <- as.matrix.csr(dist)
+ }

Set up the negative log-likelihood function, which just computes the expected number of seeds and then feeds it into the negative binomial density calculation:
> nllfun <- function(alpha, beta, loggamma, logk) {
+     e <- expseeds(alpha, beta, exp(loggamma), sourcedat$size, 
+         dist)
+     -sum(dnbinom(tdat$seeds, mu = e, size = exp(logk), log = TRUE))
+ }

Fit. I've set some (large) bounds on some of these parameters - as it turns out,
> library(bbmle)
> (m1 <- mle2(nllfun, start = list(alpha = 1, beta = 1, loggamma = -3, 
+     logk = 0), method = "L-BFGS-B", lower = c(alpha = 0, beta = 0, 
+     loggamma = -20, logk = -20), upper = c(alpha = 60, beta = Inf, 
+     loggamma = Inf, logk = 50)))
Call:
mle2(minuslogl = nllfun, start = list(alpha = 1, beta = 1, loggamma = -3, 
    logk = 0), method = "L-BFGS-B", lower = c(alpha = 0, beta = 0, 
    loggamma = -20, logk = -20), upper = c(alpha = 60, beta = Inf, 
    loggamma = Inf, logk = 50))

Coefficients:
    alpha      beta  loggamma      logk 
 3.011964  0.981971 -4.535761 10.335389 

Log-likelihood: -41.19

How did we do compared to the true values?
> round(cbind(coef(m1), confint(m1, method = "quad"), trueparms), 
+     3)
                    2.5 %   97.5 % trueparms
alpha     3.012     0.666    5.358     5.000
beta      0.982    -1.051    3.015     2.000
loggamma -4.536   -12.301    3.229    -9.210
logk     10.335 -1124.585 1145.256     0.693

The confidence limits above are quadratic approximations. How do the profile confidence limits look?
> if (file.exists("seeddisp-prof.RData")) {
+     load("seeddisp-prof.RData")
+ } else {
+     t1 <- system.time(pp <- profile(m1, trace = TRUE))
+     save("pp", "t1", file = "seeddisp-prof.RData")
+ }

(This takes a little while - 1.7 minutes - so store the results.)
> plot(pp)

seeddisp-016.png
> round(cbind(coef(m1), confint(pp), trueparms), 3)
                 2.5 % 97.5 % trueparms
alpha     3.012  1.810     NA     5.000
beta      0.982     NA     NA     2.000
loggamma -4.536     NA -0.418    -9.210
logk     10.335 -1.869     NA     0.693

The unfortunate conclusion in this case is that even with 100 plots, we don't have very much resolution of the parameters. It would help a bit if we had more contrast in the data …

(You can see the original Sweave file if you like.)

To do: reference Ribbens etc.; discuss simulated annealing etc. Power calculations???

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License