Time Series Analysis in R
Download the data from here.
Load your data
Time Series is a specific data structure in R.
We have to convert data into time series data structures to apply time series algorithms.
Chile Data
library(dplyr)
data_chile = data %>% filter(Country == "Chile")
chile = data_chile[,c(2,5)]
head(chile)
Convert into time-series data structure*
- 1,2,3,4,… as time index with no NA values.
library(zoo)
z <- read.zoo(chile, format = "%Y-%m-%d")#zoo series for dates
time(z) <- seq_along(time(z))#sequential data as 1,2,3,4,...
ts_chile = as.ts(z) #conversion into time series data structure
head(ts_chile)
- Date as a time index with NA values to be removed.
library(zoo)
library(tseries)#for removing na from ts()
z <- read.zoo(chile, format = "%Y-%m-%d")#zoo series for dates
ts_chile = as.ts(z)
ts_chile = na.remove(ts_chile)#remove na from time series
head(ts_chile)
If you convert here, like this it will create weird numbers. I prefer to do it as 1,2,3,4,… time index.
Plot the time series
plot(ts_chile)
Two parts of the Time Series, we'll discuss are
- Decomposition (Trend + Seasonal + Error)
- Forecasting
Decomposition
decompose(ts_chile)
Forecast
Model (ARIMA model)
library(forecast)
model = auto.arima(ts_chile)
model
ACF Plot
acf(model$residuals, main = "Correlogram")
PACF Plot
pacf(model$residuals, main = "Partial Correlogram")
Ljung Box Test
The test determines whether or not errors are iid (i.e. white noise) or whether there is something more behind them; whether or not the autocorrelations for the errors or residuals are non zero.
Box.test(model$residuals, type = "Ljung-Box")
We do not reject the hypothesis, that the model shows a good fit.
Normality of Residuals
hist(model$residuals, freq = FALSE)
lines(density(model$residuals))
Forecast
forecast = forecast(model,4)#4 = number of units you want to
library(ggplot2)
autoplot(forecast) #plot of the model
accuracy(forecast) #performance of the model
forecast #forecast values
Exercise: Apply it to other countries as well.