R Tutorial

Material influenced by the state space modeling activity from Michael Dietz’s excellent Ecological Forecasting book and Nicholas J Clark’s course on Ecological forecasting with mvgam and brms*

Installation

Pre-installation Windows

Before starting installation on Windows you will need to install RTools.

If you don’t already have it installed:

Follow the link
Select your appropriate version of R
Download the installer
Run it with the defaults

Installation

install.packages(c("brms", "dplyr", "gratia", "ggplot2",
                   "marginaleffects", "tidybayes", "zoo",
                   "viridis", "remotes"))
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
remotes::install_github('nicholasjclark/mvgam', force = TRUE)

macOS and Linux

library(cmdstanr)
check_cmdstan_toolchain()
install_cmdstan()
cmdstan_version()

Windows

library(cmdstanr)
check_cmdstan_toolchain(fix = TRUE)
install_cmdstan()
cmdstan_version()

If these returns a version number like "2.32.2" then things are working properly.

Text Tutorial

So far we’ve developed relatively simple time-series models
Linear time-series dependance
Linear responses to environmental factors
Normally distributed errors
No model of observation error
Most of these are violated in ecological systems
So start fitting more complex models

Data

Data on the population dynamics of the Desert Pocket Mouse

Model

Draw on board while walking through models

y_t-1    y_t    y_t+1
  |       |       |
x_t-1 -> x_t -> x_t+1   Process model

Process model

What is actually happening in the system
First order autoregressive component

x_t+1 = f(x_t) + e_t

Simple linear model is AR1:

x_t+1 = b0 + b1 * x_t + e_t

Observation model

Counts of rodents in traps aren’t perfect measures of the number of rodents (which are what should be changing in the process model and what we care about)
So model this imperfect observation

y_t = Pois(x_t)

Can be much more complicated

mvgam

Models like this are not trivial to fit
Use [STAN](http://mcmc-jags.sourceforge.net)
Uses MCMC to explore parameter space to fit the model using Bayesian methods
Typically requires learning a separate language - STAN is it’s own language
This lets you write arbitrarily complex models, but really needs a course in Bayesian methods
So, we’re going to use an R package called mvgam to implement our models
We’re going to use it because it’s the simplest way to make complex time-series model in R

library(mvgam)
library(dplyr)

mvgam requires that we modify our data a bit
Requires a time variable be present in the data to index temporal observations starting at 1.
Our newmoonnumber indexes the monthly samples so convert it to a start at 1
Also requires a a `series`` variable, which needs to be a factor
Needed for analyzing multiple time series at once (e.g., multiple species)

pp_data <- read.csv("content/data/pp_abundance_timeseries.csv") |>
  mutate(time = newmoonnumber - (min(newmoonnumber)) + 1) |>
  mutate(series = as.factor('PP')) |>
  select(time, series, abundance, mintemp, cool_precip)

We have 124 months of data so reserve 24 for testing

data_train <- filter(pp_data, time <= 100) 
data_test <- filter(pp_data, time > 100)

Simply time-series models in mvgam

Start by building something similar to what we’ve done before with the number of desert pocket mice
Model has an exogenous driver - minimum temperature
An autoregressive component
Gaussian error
Fit using the mvgam() function
Follows a Base R model structure so start with the model
Instead of including the AR component in the model we add a separate trend_model argument
We’ll use "AR1"
Specify the error family = gaussian()
Then we can specify the data for fitting the model and the data for making/evaluating forecasts

baseline_model = mvgam(abundance ~ mintemp,
                       trend_model = "AR1",
                       family = gaussian(),
                       data = data_train,
                       newdata = data_test)

Bayesian model fitting

That’s a lot of output for fitting a model
What’s going on?
It is difficult to fit complex models like the state space models we’re building towards
One way to fit these more complex models is using Bayesian methods
These methods iteratively search parameter space for the best parameter values
Using something called Markov Chain Monte Carlo (MCMC)
Draw 2D parameter search on board
The "Iteration" lines are telling us that the model is working it’s way through this process
The different “chains” are because we typically go through this process multiple times to check than we are converging to the right values
We can look at the result of this fitting process using mcmc_plot

mcmc_plot(baseline_model, type = "trace", variable = c("mintemp", "ar1[1]", "sigma[1]"))

The red lines at the bottom are providing the same information as the warnings when we fit the model
Something isn’t quite right
The model isn’t converged yet
We could try running it longer
But part of what’s going on here is that the errors aren’t really Gaussian
So let’s look at the forecast and then move on

plot(baseline_model, type = "forecast")

Remember our model looks like this

$$y_t = c + \beta_1 x_{1,t} + \beta_2 y_{t-1} + \mathcal{N}(0,\sigma^{2})$$

Which can also be written as

$$y_t = \mathcal{N}(\mu,\sigma^{2})$$ $$u_t = c + \beta_1 x_{1,t} + \beta_2 y_{t-1}$$

Prediction intervals are regularly negative because we’ve assumed normally distributed error
Actual counts can only be non-negative integers: 0, 1, 2…

Better distributions

Let’s use Poisson error structure and a log link function to give us only integer predictions

$$y_t = \mathrm{Pois}(\mu_t)$$ $$\mathrm{log(\mu_t)} = c + \beta_1 x_{1,t} + \beta_2 y_{t-1} + \mathcal{N}(0,\sigma^{2})$$

The Poisson distribution generates only integer draws based on a mean
If the mean is 1.5 sometimes you’ll draw a 1, sometimes a 2, etc.
The log transformation of $\mu$ ensures that $\mu$ is positive

poisson_model = mvgam(abundance ~ mintemp,
                      trend_model = "AR1",
                      family = poisson(link = "log"),
                      data = data_train,
                      newdata = data_test)

No more warnings at the end
Look at the model

summary(poisson_model)

And the forecast

plot(poisson_model, type = "forecast", newdata = data_test)

Now all of our predictions are positive!

State space models

Time-series model
Only first order autoregressive component
Separately model
- the process model - how the system evolves in time or space
- the observation model - observation error or indirect observations
Estimates the true value of the underlying latent state variables
State space model of AR1 + rain w/Poisson error

$$y_t = \mathrm{Pois}(\mu_t)$$ $$\mathrm{log(\mu_t)} = c + \beta_1 x_{1,t} + \beta_2 \mu_{t-1} + \mathcal{N}(0,\sigma^{2})$$

state_space_model = mvgam(abundance ~ 1,
                          trend_formula = ~ mintemp,
                          trend_model = "AR1",
                          family = poisson(link = "log"),
                          data = data_train,
                          newdata = data_test)
plot(state_space_model, type = "forecast")

output = state_space_model$model_output

Normally would want several chains with different starting positions to avoid local minima
Send to JAGS

j.model   <- jags.model (file = textConnection(RandomWalk),
                         data = data,
                         inits = init,
                         n.chains = 1)

Burn in

jags.out   <- coda.samples (model = j.model,
                            variable.names = c("tau_proc","tau_obs"),
                            n.iter = 10000)
plot(jags.out)

Sample from MCMC with full vector of X’s
This starts sampling from the point were the previous run of coda.samples ends so it gets rid of the burn-in samples

jags.out   <- coda.samples (model = j.model,
                            variable.names = c("x","tau_proc","tau_obs"),
                            n.iter = 10000)

Visualize
Convert the output into a matrix & drop parameters

out <- as.matrix(jags.out)
xs <- out[,3:ncol(out)]

Point predictions are averages across MCMC samples

predictions <- colMeans(xs)
plot(time, predictions, type = "l")

And this looks very similar to the observed dynamics of y
Add prediction intervals as range containing 95% of MCMC samples

ci <- apply(xs, 2, quantile, c(0.025, 0.975))
lines(time, ci[1,], lty = "dashed", col = "blue")
lines(time, ci[2,], lty = "dashed", col = "blue")

These are very narrow prediction intervals, so the model appears to be very confident
But it’s important to keep in mind that when fitting the value of x at time t, the model has access to the value of y at time t
And the y is present it isn’t being estimated, it’s just the observed value
So, will this model forecast well?

Forecasting

To make forecasts using a JAGS model we include data for y that is NA
This tells the model that we don’t know the values and therefore the model estimates them as part of the fitting process
To make a true forecast we would add one NA to the end of y for each time step we wanted to forecast
To hindcast or backcast like we replace the values for y that are part of the test set with NA
We’ll hindcast, so to do this we’ll replace the last year of y values with NA and then compare the final year of data to our predictions

Make these changes at top of script and rerun

data$y[(length(y)-51):length(y)] = NA
jags.out   <- coda.samples (model = j.model,
                            variable.names = c("y","tau_proc","tau_obs"),
                            n.iter = 10000)

We can see from plotting the predictions that the forecast doesn’t look promising
Without the observed data to influence the estimates of x[t] the model predicts little change over the forecast year
We can directly compare this to the empirical data by adding it to the plot

lines(time, y)

So the point estimates don’t perform well
This raises the question of whether the model accurately predicts that it is uncertain when making forecasts
Plotting the prediction intervals suggests that it does
They very quickly expand towards zero and the upper limits of the data

Last updated on Nov 12, 2023

Edit this page