R Tutorial

Video Tutorials

Video tutorials are based on the old forecast package. Text tutorials have been updated to use fable and associated packages.

library(tsibble)
library(fable)
library(feasts)
library(dplyr)

pp_data = read.csv("pp_abundance_by_month.csv") |>
  mutate(month = yearmonth(month)) |>
  as_tsibble(index = month)
pp_data

gg_tsdisplay(pp_data, abundance)

$$y_t = c + \beta_1 y_{t-1} + \epsilon_t$$

ar1_model = model(pp_data, ARIMA(abundance ~ pdq(1,0,0) + PDQ(0,0,0)))
report(ar1_model)

To make forecasts from a model we ask what the model would predict at the time-step
For the AR1 model this depends on $c$, $\beta_1$, $y_{t-1}$, and $\epsilon_t$

ar1_forecast = forecast(ar1_model)
ar1_forecast

Forecast object has information on
- Model used for forecasting
- Time step being forecast
- The expected value, or point forecast, is in $.mean
- And information about the error term in abundance <dist>
Change the number of time-steps in the forecast using h (default to 2 seasonal cycles)

ar1_forecast = forecast(ar1_model, h = 20)

autoplot(ar1_forecast)

autoplot(ar1_forecast, pp_data)

ar1_forecast = forecast(ar1_model, bootstrap = TRUE, times = 1)
autoplot(ar1_forecast, pp_data)

This lets us run a single forecast including a randomly chosen value for $epsilon_t$ at teach time step
Let’s run it a few times
Quantify variability of forecast outcomes
If we set times to 1000

ar1_forecast = forecast(ar1_model, bootstrap = TRUE, times = 1000)
autoplot(ar1_forecast, pp_data)

ar1_forecast = forecast(ar1_model)
autoplot(ar1_forecast, pp_data, level = c(50, 80))

autoplot(ar1_forecast, pp_data, level = c(50, 80))

ar1_forecast |>
  hilo() |>
  print(width = 90)

ar1_forecast |>
  hilo(level = c(50, 80)) |>
  print(width = 90)

Does it look like 80% of the empirical points will fall within the lighter band?
How do we tell?
We’ll come back to this when we learn how to evaluate forecasts

Instructors note - Only variation in $\epsilon_t$ is included, not uncertainty in parameters

arima_exog_model = model(pp_data, ARIMA(abundance ~ mintemp))
report(arima_exog_model)

$$y_t = c + \beta_1 x_{1,t} + \beta_2 y_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t$$

Instructors note - the actual model is $$y_t = c + \beta_1 x_{1,t} + \eta_t$$ $$\eta_t = \beta_2 \eta_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t$$

So we need values of $x_{1,t}$ to make predictions for time step $t$
To forecast with covariates we need forecasts for those covariates
Since our time-series ended in 2020 we’ll use the observed values for the next two years

climate_forecasts = read.csv("pp_future_climate.csv") |>
  mutate(month = yearmonth(month)) |>
  as_tsibble(index = month)
climate_forecasts

arima_exog_forecast = forecast(arima_exog_model, new_data = climate_forecasts)
autoplot(arima_exog_forecast, pp_data)

Last updated on Oct 5, 2023