R Tutorial

Video Tutorials

Watch Time series modeling: starting with white noise

Import portal_timeseries.csv in R using read.csv. Convert the NDVI data column into a time series object using ts(). Name that time series object: NDVI.ts Convert the rain column into a time series object using ts(). Name that time series object: rain.ts
Watch Fitting a white noise model to data

Use meanf() to fit the whitenoise model to rain.ts Plot rain.ts Add the fitted model for rain to that plot using lines()
Watch Explaining the ARIMA model

Watch Fitting an Arima model in R

Generate the acf and pacf plots for rain.ts Examine those plots and decide what a good initial ARIMA model structure would be (AR vs MA?, How many orders?) Fit that model using the Arima() function. Examine the residuals of the model using checkresiduals()
Watch Modeling seasonal signals in ARIMA models

Watch Fitting a seasonal ARIMA in R

Examine the acf graph for your rain Arima model that you produced in step 7 above. Use Arima() to fit a seasonal model to rain.ts based on the information in your acf plot
Watch Using auto.arima() in R

Use auto.arima() to fit the rain.ts data using the default settings (i.e. just give it the data, do not change any max values) Examine the model fit using checkresiduals() Modify the max orders, if needed, and rerun the model.
Watch Fitting external predictors using auto.arima()

Text Tutorial

So far we’ve learned about time series objects, seasonal and long-term signals, and the influence of the past on current observations
Take all that information and turn it into models
Let’s first load the packages we’ll need for today

library(tsibble) # convert time-series data in a tsibble
library(fable) # main package for modeling and forecasting with time-series data
library(feasts) # time-series data visualization
library(dplyr) # data manipulation

Then load our data

data = read.csv("portal_timeseries.csv")
data_ts <- raw_data |>
  mutate(month = yearmonth(date)) |>
  as_tsibble(index = month)
data_ts

We’re going to be working with the NDVI data
Reminder ourselves that that looks like

gg_tsdisplay(data_ts, NDVI)

White noise model

We’ll start with the simplest time-series model possible - white noise
The data is normally distributed with a fixed mean and variance
It takes the form
y_t = c + e_t, where e_t ~ N(0, sigma)
So each time step in our model is a random draw from a normal distribution with a mean of c
We fit time-series models using the fable package
This model structure is provided by the MEAN() function

MEAN()

This output tells us that it is a model definition
To fit that general model structure to our data we use the model() function

avg_model = model(data_ts, MEAN(NDVI))

We can then look at the resulting model information using the report() function

report(avg_model)

This shows us that the model has a white noise structure (indicated by MEAN), a mean value of 0.1791, and a variance of 0.0031
To visualize the model with the data we have to first make the fitted values available using augment()

avg_model_aug <- augment(avg_model)
avg_model_aug

We can see that this produces a tsibble that includes month, NDVI, the fitted values from the model, & the model residuals
Use autoplot() to look at the data and model together
The predicted values from the model are stored in a special columns .fitted

autoplot(avg_model_aug, NDVI) + autolayer(avg_model_aug, .fitted, color = "red")

This simple model doesn’t work very well
There is clearly autocorrelation and seasonality in the time-series
We can look at this directly by plotting the residuals and looking at their autocorrelation
Our model assumes that the residuals are normally distributed and independent

gg_tsresiduals(avg_model)

We see the same autocorrelation structure as the original time-series, because we didn’t do anything to model it
We’ll address that next, but first

You do:
Fit a white noise model to the rain data
Plot your data with the model fit on top
Plot the residuals

AR models

Let’s build a model that takes the autocorrelation into account
Remember that we have lag 1 and lag 2 autocorrelation plus a season signal

gg_tsdisplay(data_ts, NDVI)

Let’s start with just the lag 1 and lag 2 autocorrelation
Use an “autoregressive” or AR model
Current value depends on past values
The simplest version of this type of model is an AR1 model

leave room to add y_t-2

y_t = c + b_1 * y_t-1 + e_t, where e_t ~ N(0, sigma)

c is a constant, like the intercept in regression
b_1 is a coefficient determining how y_t is related to y at a 1 time-step lag, i.e., the previous time step
e_t is normally distributed error
Does this model remind you of a biological model?
This model is basically a Gompertz population model if y is log(N)
The idea is that the current value influences the future values
Makes a lot of sense for things like population dynamics
Since we have also have lag 2 autocorrelation we add a term for two time steps back

y_t = c + b1 * y_t-1 + b2 * y_t-2 + e_t, where e_t ~ N(0, sigma)

Instructors note: Actually y_t = (1 - b1 - b2) * c + b1 * y_t-1 + b2 * y_t-2 + e_t due to non-zero mean

This type of model structure is available in fable’s AR() model
If we want to specify how many autoregressive terms to include we specify the model as an R formula

ar_model = model(data_ts, AR(NDVI ~ order(2)))

The order() function lets us specify how many lags to include
So an AR1 model would have order(1)
We’ve written an AR2 model
Let’s look at the model using report()

report(ar_model)

There is a large, positive, ar1 value (b_1)
So if NDVI was high at the previous time step it’s expected to be high at the current time step
There is a smaller, negative, ar2 value (b_2)
So if NDVI was high two time steps back, it’s expected to be lower at the current time step

ar_model_aug = augment(ar_model)
ar_model_aug

Note there aren’t predictions for the first two time-steps
Not possible because there are no y values before March 1992 for the model to use for prediction

autoplot(ar_model_aug, NDVI) + autolayer(ar_model_aug, .fitted, color = "orange")

This looks a lot better
Let’s take a look at the residuals

gg_tsresiduals(ar_model)

The residuals look better
We successfully removed the short time-scale autocorrelation
But the season signal is still present
We’ll work on that next time

You do:
Fit an AR1 model to the rain data
Plot your data with the model fit on top
Plot the residuals

How do the residuals look?
Why do you think there might be a stronger two year autoregressive component that the one year component?

Last updated on Sep 21, 2023

Edit this page