R Tutorial

  • Load the packages
library(tsibble) # convert time-series data in a tsibble
library(fable) # main package for modeling and forecasting with time-series data
library(feasts) # time-series data visualization
library(dplyr) # data manipulation
  • Then load the data
data = read.csv("portal_timeseries.csv")
data_ts <- data |>
  mutate(month = yearmonth(date)) |>
  as_tsibble(index = month)

ARIMA models

  • Our simple autoregressive models were a good start
  • But both the variables we looked at showed strong seasonal components that they didn’t capture
  • To add seasonal components to our model we need to learn about a more complex version of AR models
  • ARIMA models
  • ARIMA stands for AutoRegressive Integrated Moving Average
  • We’ve already learned the AutoRegressive part, so let’s talk about the other two

Moving average models

  • Note that this is not using a moving average like you did when decomposing time-series
  • MA models are like AR models in that they use past observations to predict the next step
  • But instead of using the values themselves they use the past errors
  • So a 1st order MA model looks like this

y_t = c + theta_1 * e_t-1 + error_t

  • So if theta_1 > 0 then if the observation has greater than the mean at the previous time step it is like to be greater than the mean at the current time step

  • Positive MA components say that if the previous observation was an outlier then the next observation is expected to be an outlier in the same direction

  • This makes sense for NDVI

  • Greenness isn’t really driven by greenness in the same way a populations abundance is

  • But if it’s greener than normal this month it’s probably a good year so it’s likely to be greener than normal next month

  • Both AR and MA structures can be present in time-series

  • We can combine them in an ARMA model

y_t = c + b_1 * y_t-1 + theta_1 * error_t-1 + error_t

  • An assumption when fitting models of this form is that the time-series is “stationary”
  • Basically - there is no general trend in the data
  • This is handled using “differencing” which is similar to what we did when decomposing time-series
  • To take out the trend before model fitting the data is differenced
  • Typically by subtracting the previous value

y_t' = y_t - y_t-1

  • So, if we difference the data we are now modeling the change from time-step to time-step, not the actual values

  • Adding AR, MA, and differencing together gives us an ARIMA model

    • AR: Autoregressive - Uses past values to predict future values
    • I: Integrated - Uses differencing to handle trends
    • MA: Moving Average - Uses past errors to predict future errors

Seasonal models

  • One of the things ARIMA models can do easily in R is fit seasonal models
  • The ARIMA model structure is available in fable in the ARIMA() function
arima_model = model(data_ts, ARIMA(NDVI))
  • If we don’t provide it any details on model structure it will try to find the best fitting ARIMA model
  • In ARIMA notation we have 3 numbers, given in parentheses, labeled (p, d, q)
  • p is the AR order, d is the degree of differencing, and q is the MA order
  • So this model is a 3rd order MA model, with no differencing, and no autoregressive terms

y_t = c + theta_1 * e_t-1 + theta_2 * e_t-2 + theta_3 * e_t-3 + e_t

  • This is also shown in the Coefficients table by the ma1, ma2, and ma3 terms

  • So, if NDVI was above average over the last 3 months we expect it to be above average now

  • This lines up with our idea that MA might be better for NDVI since it indicates a good year

  • But what is the sar1 coefficient?

  • That’s our season signal

  • A seasonal autoregressive term of order 1

  • The seasonal component is also shown in PDQ notation in the second set of parentheses

  • We model season components with lags of the length of a full seasonal cycle

  • In this case a full cycle is 1 year

  • So the SAR1 part of the model as

y_t = c + b_12 * y_t-12

  • The value in the current month is related to the value observed 1 year ago
  • This makes a lot of sense for NDVI
  • Ecosystems are typically greener during the summer than in the winter
  • So a good piece of information is how green the ecosystem was at this time last year
  • Our full model would therefore be

y_t = c + theta_1 * e_t-1 + theta_2 * e_t-2 + theta_3 * e_t-3 + b_12 * y_t-12 + e_t

  • What does this look like?
arima_model_aug = augment(arima_model)
autoplot(arima_model_aug, NDVI) + autolayer(arima_model_aug, .fitted, color = "orange")
  • Looks good
  • Check the residuals
  • Seasonal autocorrelation is now gone

You do:

  • Fit an ARIMA model to the rain data
  • Plot your data with the model fit on top
  • Plot the residuals
  • What do you think about this result?