Some of the methods for doing forecasting in Business and Economics are (1) Exponential Smoothing Technique (2) Single Equation Regression Technique (3) Simultaneous-equation Regression Method (4) Autoregressive Integrated Moving Average (ARIMA) Models (5) Vector Autoregression (VAR) Method

The lecture will demonstrate the ARIMA which is purely univariable method of forecasting. The main philosophy here is: “Let the data speak for itself”

The lecture will cover both the background theorems and its execution through R. In this post, we will mainly discuss some theoretical foundation only and in the next few posts, we will discuss the practical aspects of ARIMA.

ARIMA is an acronym that stands for AutoRegressive Integrated Moving Average. So, it is necessary to know the underlying properties of AutoRegressive(AR), Moving Average (MA) and order of integration.

Autoregressive (AR) Process

We start with Yt which is non-stationary in nature (for example, GDP of India, Stock market indices etc.).

Let Yt is modelled as:

Yt=δ+α1Yt1+ut        (1)where  δ=any constantut=white noise

The value of Y at time t depends on its value in the previous time period and a random term. In other words, this model says that the forecast value of $Y$ at time t is simply some proportion (=α1) of its value at time $(t-1)$ plus a random shock or disturbance at time t.

For the stationarity of the series, it is required that |α1|<1.

Again if we write the following model:

Yt=δ+α1Yt1+α2Yt2+ut        (2)        AR(2)  Process 

That is, the value of Y at time t depends on its value in the previous two time periods.

Following this fashion, we can write:

Yt=δ+α1Yt1+α2Yt2+.+αpYt1+ut        (3)        AR(p)  Process 

Properties of AR(1) Process:

The mean is given as:

E(Yt)=E(δ+α1Yt1+ut)E(Yt)=E(δ)+E(α1Yt1)+E(ut)E(Yt)=δ+α1E(Yt1)+0

Assuming that the series is stationary, E(Yt)=E(Yt1)=μ  (common mean),

μ=δ+α1μμ=δ1α1

The variance is calculated as follows:

By independence of errors term and values of Yt:

Var(Yt)=Var(δ)+Var(ϕ1Yt1)+Var(ut)Var(Yt)=ϕ21Var(Yt1)+σ2u

By stationary assumption, Var(Yt)=Var(Yt1) and substituting this, you will get:

(1ϕ21)>0       since  Var(Yt)>0|ϕ1|<1

ACF, PACF and Correlogram:

The ACF at lag k is defined as:

ρk=Covariance at lag kvariance=γkγ0

Since both covariance and variance are measured in the same units of measurement, ρk
is a unitfree, or pure, number. It lies between −1 and +1, as any correlation coefficient does.
If we plot ρk against k, the graph we obtain is known as the correlogram. It helps us to identify the stationarity of the time series data.

For PACF, it is the partial autocorrelation which are plotted against the number of lag taken.

Moving Average (MA) Process

Let us write Yt as follows:

Yt=μ+β0ut+β1ut1        (4)        MA(1)  Processwhere   μ=constant    and  ut=white noise term

That is, Yt at time t is equal to a constant (μ) plus a moving average (β0ut+β1ut1) of the current and past error terms.

So, the MA(2) process can be written as:

Yt=μ+β0ut+β1ut1+β2ut2        (5)        MA(2)  Process

The MA(q) process can be written as:

Yt=μ+β0ut+β1ut1+.+βqutq        (6)        MA(q)  Process

Theoretical Properties of a Time Series with an MA(1) Model

Mean=E(yt)=μVariance=var(yt)=σ2μ(1+β21)The Autocorrelation fucntion (ACF):ρ1=β11+β21   &  ρq=0  for all  q2

Note that the only nonzero value in the theoretical ACF is for lag 1. All other autocorrelations are 0. Thus a sample ACF with a significant autocorrelation only at lag 1 is an indicator of a possible $MA(1)$ model.

Theroretical Properties of a Time Series with an MA(2) Model

Mean=E(yt)=μVariance=var(yt)=σ2μ(1+β21+β22)Autocorrelation function (ACF) is:ρ1=1+β1β21+β21+β22   &ρ2=β21+β21+β22  ρk=0  for all  q3

Note that the only nonzero values in the theoretical ACF are for lags 1 and 2. Autocorrelations for higher lags are 0. So, a sample ACF with significant autocorrelations at lags 1 and 2, but non-significant autocorrelations for higher lags indicates a possible MA(2) model.

Autoregressive and Moving Average (ARMA) Process

If it is likely that Y has characteristics of both AR and MA and it is called ARMA. Thus, Yt follows an ARMA(1,1) process if it can be written as:

Yt=θ+α1Yt1+β0ut+β1ut1          (7)      ARMA(1,1)

So, the ARMA(p,q) can be written as:

Yt=θ+α1Yt1+αpYtp+β0ut+β1ut1+..+βqutq          (8)      ARMA(p,q)

Autoregressive Integrated Moving Average (ARIMA) Process

The earlier models of time series are based on the assumptions that the time series variable is stationary (at least in the weak sense).

But in practical, most of the time series variables will be non-stationary in nature and they are intergrated series.

This implies that you need to take either the first or second difference of the non-stationary time series to convert them into stationary.

As such they may I(1) or I(2) and so on.

Therefore, if you have to difference a time series d times to make it stationary and then apply the ARMA(p,q) model to it, it is said that

– the original time series is ARIMA(p,d,q), that is, it is an ARIMA series
where p denotes the number of autoregressive terms,
d the number of times the series has to be differenced before it becomes stationary, and
q the number of moving average terms.

Box-Jenkins (BJ) Methodology

First of all, you must have to identify the particular series is stationary or if not the order of integration to convert it to stationary. That is, you have to identity the ARIMA series.

The BJ Methods can answer this question.

The steps in BJ-Methods are as under:

Step 1: Examine the Data

As a starting point it is always advisable to examine the data visually before going for details mathematical modeling. So, the examination of data involves the following things:

Step 2: Decompose your data

Though ARIMA can be fitted to both seasonal and non-seasonal data. Seasonal ARIMA requires a more complicated specification of the model structure, although the process of determining (p,d,q) is similar to that of choosing non-seasonal order parameters.
Therefore, sometimes decomposing the data will be give additional benefit. So, we are seeking to answer the following questions in this step:

Step 3: Identification

The identification step involves the following:

Step 4: Estimation of the ARIMA Model

Step 5: Diagnostic checking

Step 6: Forecasting the Future Values