Forecasting with ARIMA – Part I

Some of the methods for doing forecasting in Business and Economics are (1) Exponential Smoothing Technique (2) Single Equation Regression Technique (3) Simultaneous-equation Regression Method (4) Autoregressive Integrated Moving Average (ARIMA) Models (5) Vector Autoregression (VAR) Method

The lecture will demonstrate the ARIMA which is purely univariable method of forecasting. The main philosophy here is: “Let the data speak for itself”

The lecture will cover both the background theorems and its execution through R. In this post, we will mainly discuss some theoretical foundation only and in the next few posts, we will discuss the practical aspects of ARIMA.

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. So, it is necessary to know the underlying properties of AutoRegressive(AR), Moving Average (MA) and order of integration.

Autoregressive (AR) Process

We start with $Y_t$ which is non-stationary in nature (for example, GDP of India, Stock market indices etc.).

Let $Y_t$ is modelled as:

$Y_t =\delta+\alpha_1Y_{t-1}+u_t~~~~~~~~\dots(1)\\where~~\delta=\text {any constant}\\u_t=\text {white noise}$

The value of $Y$ at time $t$ depends on its value in the previous time period and a random term. In other words, this model says that the forecast value of $Y$ at time $t$ is simply some proportion $(=\alpha_1 )$ of its value at time $(t-1)$ plus a random shock or disturbance at time $t$ .

For the stationarity of the series, it is required that $|\alpha_1|<1$ .

Again if we write the following model:

$Y_t =\delta+\alpha_1Y_{t-1}+\alpha_2Y_{t-2}+u_t~~~~~~~~\dots(2) ~~~~~~~~\implies AR(2) ~~Process~$

That is, the value of $Y$ at time $t$ depends on its value in the previous two time periods.

Following this fashion, we can write:

$Y_t=\delta+\alpha_1Y_{t-1}+\alpha_2Y_{t-2}+…….+\alpha_pY_{t-1}+u_t~~~~~~~~\dots(3) ~~~~~~~~\implies AR(p) ~~Process~$

Properties of AR(1) Process:

The mean is given as:

$\begin {aligned} E(Y_t)&=E( \delta+\alpha_1 Y_{t-1}+u_t ) \\ E( Y_t )&= E( \delta) +E( \alpha_1 Y_{t-1} )+E( u_t )\\E( Y_t )&=\delta+\alpha_1 E( Y_{t-1} )+0 \end {aligned}$

Assuming that the series is stationary, $E(Y_t)=E(Y_{t-1})=\mu ~~\text {(common mean)},$

$\begin {aligned} \mu&=\delta+\alpha_1 \mu\\\implies \mu&=\frac {\delta}{1-\alpha_1} \end {aligned}$

The variance is calculated as follows:

By independence of errors term and values of $Y_t$ :

$\begin {aligned} Var(Y_t) &= Var (\delta)+Var(\phi_1 Y_{t-1})+Var(u_t)\\ Var(Y_t)&=\phi_1^2 Var(Y_{t-1})+\sigma^2_u \end {aligned}$

By stationary assumption, $Var(Y_t)=Var(Y_{t-1})$ and substituting this, you will get:

$(1-\phi_1^2)>0 ~~~~~~~since ~~Var (Y_t)>0\\\implies |\phi_1|<1$

ACF, PACF and Correlogram:

The ACF at lag $k$ is defined as:

$\rho_k=\frac {\text {Covariance at lag k}}{variance}=\frac {\gamma_k}{\gamma_0}$

Since both covariance and variance are measured in the same units of measurement, $\rho_k$
is a unitfree, or pure, number. It lies between −1 and +1, as any correlation coefficient does.
If we plot $\rho_k$ against $k$ , the graph we obtain is known as the correlogram. It helps us to identify the stationarity of the time series data.

For PACF, it is the partial autocorrelation which are plotted against the number of lag taken.

Moving Average (MA) Process

Let us write $Y_t$ as follows:

$Y_t= \mu+\beta_0u_t+\beta_1 u_{t-1} ~~~~~~~~\dots (4) ~~~~~~~~\implies MA(1)~~ Process \\where~~~\mu= constant~~~~and ~~u_t=\text {white noise term}$

That is, $Y_t$ at time $t$ is equal to a constant $(\mu)$ plus a moving average $(\beta_0u_t+\beta_1 u_{t-1})$ of the current and past error terms.

So, the $MA(2)$ process can be written as:

$Y_t= \mu+\beta_0u_t+\beta_1 u_{t-1}+\beta_2 u_{t-2} ~~~~~~~~\dots (5) ~~~~~~~~\implies MA(2)~~ Process$

The $MA(q)$ process can be written as:

$Y_t= \mu+\beta_0u_t+\beta_1 u_{t-1}+…….+\beta_q u_{t-q} ~~~~~~~~\dots (6) ~~~~~~~~\implies MA(q)~~ Process$

Theoretical Properties of a Time Series with an MA(1) Model

$\begin {aligned} Mean &=E(y_t)=\mu\\Variance&=var(y_t)=\sigma_{\mu}^2 (1+\beta_1^2)\\&\text {The Autocorrelation fucntion (ACF):}\\&\rho_1=\frac {\beta_1}{1+\beta_1^2}~~~\&~~\rho_q=0~~ \text {for all}~~q\ge2 \end {aligned}$

Note that the only nonzero value in the theoretical ACF is for lag 1. All other autocorrelations are 0. Thus a sample ACF with a significant autocorrelation only at lag 1 is an indicator of a possible $MA(1)$ model.

Theroretical Properties of a Time Series with an MA(2) Model

$\begin {aligned} Mean &=E(y_t)=\mu\\Variance&=var(y_t)=\sigma_{\mu}^2 (1+\beta_1^2+\beta_2^2)\\&\text {Autocorrelation function (ACF) is:}\\\rho_1&=\frac {1+\beta_1\beta_2}{1+\beta_1^2+\beta_2^2}~~~\&\rho_2=\frac {\beta_2}{1+\beta_1^2+\beta_2^2}~~\rho_k=0~~ \text {for all}~~q\ge3 \end {aligned}$

Note that the only nonzero values in the theoretical ACF are for lags 1 and 2. Autocorrelations for higher lags are 0. So, a sample ACF with significant autocorrelations at lags 1 and 2, but non-significant autocorrelations for higher lags indicates a possible $MA(2)$ model.

Autoregressive and Moving Average (ARMA) Process

If it is likely that $Y$ has characteristics of both AR and MA and it is called ARMA. Thus, $Y_t$ follows an $ARMA(1, 1)$ process if it can be written as:

$Y_t = \theta + \alpha_1 Y_{t-1}+\beta_0 u_t+\beta_1 u_{t-1}~~~~~~~~~~\dots (7)~~~~~~\implies ARMA(1,1)$

So, the $ARMA(p,q)$ can be written as:

$Y_t = \theta + \alpha_1 Y_{t-1}+………\alpha_p Y_{t-p}+\beta_0 u_t+\beta_1 u_{t-1}+……..+\beta_q u_{t-q}~~~~~~~~~~\dots (8)~~~~~~\implies ARMA(p,q)$

Autoregressive Integrated Moving Average (ARIMA) Process

The earlier models of time series are based on the assumptions that the time series variable is stationary (at least in the weak sense).

But in practical, most of the time series variables will be non-stationary in nature and they are intergrated series.

This implies that you need to take either the first or second difference of the non-stationary time series to convert them into stationary.

As such they may $I(1)$ or $I(2)$ and so on.

Therefore, if you have to difference a time series $d$ times to make it stationary and then apply the $ARMA( p, q)$ model to it, it is said that

– the original time series is $ARIMA( p, d, q)$ , that is, it is an ARIMA series
where $p$ denotes the number of autoregressive terms,
$d$ the number of times the series has to be differenced before it becomes stationary, and
$q$ the number of moving average terms.

Box-Jenkins (BJ) Methodology

First of all, you must have to identify the particular series is stationary or if not the order of integration to convert it to stationary. That is, you have to identity the ARIMA series.

The BJ Methods can answer this question.

The steps in BJ-Methods are as under:

Step 1: Examine the Data

As a starting point it is always advisable to examine the data visually before going for details mathematical modeling. So, the examination of data involves the following things:

* Plot the data and examine its patterns and irregularities
* Clean up any outliers or missing values if needed
* Use of tsclean() function in R is a convenient method for outlier removal and replacing the missing values
* You may take a logarithm of a series to help stabilize a strong growth trend as logarithm reduces the scale of the data

Step 2: Decompose your data

Though ARIMA can be fitted to both seasonal and non-seasonal data. Seasonal ARIMA requires a more complicated specification of the model structure, although the process of determining $(p, d, q)$ is similar to that of choosing non-seasonal order parameters.
Therefore, sometimes decomposing the data will be give additional benefit. So, we are seeking to answer the following questions in this step:

* Does the series appear to have trends or seasonality?
* You may use decompose() or stl() function in R to examine and possibly remove components of the series

Step 3: Identification

The identification step involves the following:

* Is the series stationary?
* Make use of Correlogram and Augmented Dicky-Fuller test to check it and to determine the order of differencing needed
* Autocorrelations and choosing model order
* Choose order of the ARIMA (that is, find out the appropriate values of $((p, d, ~\&~ q)$ by examining ACF and PACF plots

Step 4: Estimation of the ARIMA Model

Having identified the appropriate $p, d~\&~ q$ values, estimate the parameters of the autoregressive and moving average terms included in the model.
Sometimes this calculation can be done by simple least squares but sometimes we will have to resort to nonlinear (in parameter) estimation methods.
The ‘forecast’ package allows you to explicitly specify the order of the model using the arima() function, or automatically generate a set of optimal $(p,d,q)$ using auto.arima(). This function searches through combinations of order parameters and picks the set that optimizes model fit criteria. There exist a number of such criteria for comparing the quality of fit across multiple models. Two of the most widely used is Akaike information criteria (AIC) and Bayesian information criteria (BIC). These criteria are closely related and can be interpreted as an estimate of how much information would be lost if a given model is chosen. When comparing models, one wants to minimize AIC and BIC.
While auto.arima() can be very useful, it is still important to complete previous steps in order to understand the series and interpret model results. Note that auto.arima() also allows the user to specify a maximum order for (p, d, q), which is set to 5 by default.

Step 5: Diagnostic checking

After the ARIMA specification and parameters estimation, it is required to check whether the chosen model fits the data reasonably well.
Practically it is possible that another ARIMA model might fit the data well. Then which to choose?
One simple way to see this whether the estimated residuals from the model are white noise. If they are, you can accept that particular ARIMA order; if not, you must start the process with another ARIMA order.
Thus, the BJ methodology is an iterative process and skill is required to find out the best fit.

Step 6: Forecasting the Future Values

* Finally, forecast the future values by making use of the fitted ARIMA model.

ARIMA