# R Tutorial

## Video Tutorials

- Watch Time series modeling: starting with white noise

Import portal_timeseries.csv in R using read.csv. Convert the NDVI data column into a time series object using ts(). Name that time series object: NDVI.ts Convert the rain column into a time series object using ts(). Name that time series object: rain.ts

Watch Fitting a white noise model to data

Use meanf() to fit the whitenoise model to rain.ts Plot rain.ts Add the fitted model for rain to that plot using lines()

Watch Explaining the ARIMA model

- Watch Fitting an Arima model in R

Generate the acf and pacf plots for rain.ts Examine those plots and decide what a good initial ARIMA model structure would be (AR vs MA?, How many orders?) Fit that model using the Arima() function. Examine the residuals of the model using checkresiduals()

Watch Modeling seasonal signals in ARIMA models

- Watch Fitting a seasonal ARIMA in R

Examine the acf graph for your rain Arima model that you produced in step 7 above. Use Arima() to fit a seasonal model to rain.ts based on the information in your acf plot

Watch Using auto.arima() in R

Use auto.arima() to fit the rain.ts data using the default settings (i.e. just give it the data, do not change any max values) Examine the model fit using checkresiduals() Modify the max orders, if needed, and rerun the model.

Watch Fitting external predictors using auto.arima()

## Text Tutorial

- So far we’ve learned about time series objects, seasonal and long-term signals, and the influence of the past on current observations
- Take all that information and turn it into models
- Let’s first load the packages we’ll need for today

```
library(tsibble) # convert time-series data in a tsibble
library(fable) # main package for modeling and forecasting with time-series data
library(feasts) # time-series data visualization
library(dplyr) # data manipulation
```

- Then load our data

```
data = read.csv("portal_timeseries.csv")
data_ts <- raw_data |>
mutate(month = yearmonth(date)) |>
as_tsibble(index = month)
data_ts
```

- We’re going to be working with the NDVI data
- Reminder ourselves that that looks like

```
gg_tsdisplay(data_ts, NDVI)
```

### White noise model

We’ll start with the simplest time-series model possible - white noise

The data is normally distributed with a fixed mean and variance

It takes the form

`y_t = c + e_t, where e_t ~ N(0, sigma)`

So each time step in our model is a random draw from a normal distribution with a mean of

`c`

We fit time-series models using the

`fable`

packageThis model structure is provided by the

`MEAN()`

function

```
MEAN()
```

- This output tells us that it is a model definition
- To fit that general model structure to our data we use the
`model()`

function

```
avg_model = model(data_ts, MEAN(NDVI))
```

- We can then look at the resulting model information using the
`report()`

function

```
report(avg_model)
```

- This shows us that the model has a white noise structure (indicated by
`MEAN`

), a mean value of 0.1791, and a variance of 0.0031 - To visualize the model with the data we have to first make the fitted values available using
`augment()`

```
avg_model_aug <- augment(avg_model)
avg_model_aug
```

- We can see that this produces a
`tsibble`

that includes month, NDVI, the fitted values from the model, & the model residuals - Use
`autoplot()`

to look at the data and model together - The predicted values from the model are stored in a special columns
`.fitted`

```
autoplot(avg_model_aug, NDVI) + autolayer(avg_model_aug, .fitted, color = "red")
```

- This simple model doesn’t work very well
- There is clearly autocorrelation and seasonality in the time-series
- We can look at this directly by plotting the residuals and looking at their autocorrelation
- Our model assumes that the residuals are normally distributed and independent

```
gg_tsresiduals(avg_model)
```

- We see the same autocorrelation structure as the original time-series, because we didn’t do anything to model it
- We’ll address that next, but first

You do:

- Fit a white noise model to the
`rain`

data- Plot your data with the model fit on top
- Plot the residuals

## AR models

- Let’s build a model that takes the autocorrelation into account
- Remember that we have lag 1 and lag 2 autocorrelation plus a season signal

```
gg_tsdisplay(data_ts, NDVI)
```

- Let’s start with just the lag 1 and lag 2 autocorrelation
- Use an “autoregressive” or AR model
- Current value depends on past values
- The simplest version of this type of model is an AR1 model

leave room to add y_t-2

`y_t = c + b_1 * y_t-1 + e_t, where e_t ~ N(0, sigma)`

c is a constant, like the intercept in regression

b_1 is a coefficient determining how y_t is related to y at a 1 time-step lag, i.e., the previous time step

e_t is normally distributed error

Does this model remind you of a biological model?

This model is basically a Gompertz population model if y is log(N)

The idea is that the current value influences the future values

Makes a lot of sense for things like population dynamics

Since we have also have lag 2 autocorrelation we add a term for two time steps back

`y_t = c + b1 * y_t-1 + b2 * y_t-2 + e_t, where e_t ~ N(0, sigma)`

Instructors note: Actually

`y_t = (1 - b1 - b2) * c + b1 * y_t-1 + b2 * y_t-2 + e_t`

due to non-zero mean

- This type of model structure is available in
`fable`

’s`AR()`

model - If we want to specify how many autoregressive terms to include we specify the model as an R formula

```
ar_model = model(data_ts, AR(NDVI ~ order(2)))
```

- The
`order()`

function lets us specify how many lags to include - So an AR1 model would have
`order(1)`

- We’ve written an AR2 model
- Let’s look at the model using
`report()`

```
report(ar_model)
```

- There is a large, positive, ar1 value (b_1)
- So if NDVI was high at the previous time step it’s expected to be high at the current time step
- There is a smaller, negative, ar2 value (b_2)
- So if NDVI was high two time steps back, it’s expected to be lower at the current time step

```
ar_model_aug = augment(ar_model)
ar_model_aug
```

- Note there aren’t predictions for the first two time-steps
- Not possible because there are no y values before March 1992 for the model to use for prediction

```
autoplot(ar_model_aug, NDVI) + autolayer(ar_model_aug, .fitted, color = "orange")
```

- This looks a lot better
- Let’s take a look at the residuals

```
gg_tsresiduals(ar_model)
```

- The residuals look better
- We successfully removed the short time-scale autocorrelation
- But the season signal is still present
- We’ll work on that next time

You do:

- Fit an AR1 model to the
`rain`

data- Plot your data with the model fit on top
- Plot the residuals

- How do the residuals look?
- Why do you think there might be a stronger two year autoregressive component that the one year component?