Ar And Ma Time Series
This time I will give a short introduction to time series models, since I find myself looking up the same things over and over, because I forget them in the meantime. I will cover the AR(p)-model and quickly define the MA(q)-model. These are the basic models, on which the more sophisticated ones are built.
In the following will always be a time series,
i.e. a series of random variables.
We define the mean of the time series to be simply the expectation .
We say that a time series is stationary in the mean if the mean is a constant, i.e., .
We define the variance of a time series to be . If the time series is stationary in the mean this is just .
We say that a time series is stationary in the variance if, you guessed it, the variance is a constant, i.e., .
If a time series is stationary in the mean and stationary in the variance we say that it is second order stationary. Then the correlation of shifted observations is only a function of the shift.
There is also the notion of strong stationarity, which basically means that if you shift the time series, the cumulative probability distribution does not change. If you are interested in that, you may want to look it up. I am not going to split hairs over this.
The problem with all these definitions is that you normally do not
know any of these terms, since you only have one realization of the
time series. Thus you cannot know the mean, since this is a function
of time. Also you cannot compute the variance, since the variance also
depends on the time.
Normally you assume some properties like stationarity in the mean, if it makes sense for your observations and then go on to compute what you are allowed to compute.
If a series is second order stationary then we can define the autocovariance of lag as . Note that exists since is second order stationary.
The autocorrelation of lag of a second order stationary time series is the covariance of and divided by its variance. Or the autocovariance of lag of divided by its variance. I.e. . Alternatively it can be defined as the correlation between and .
In the following will always denote white noise. That means that are independent and identically distributed.
We denote the backward shift operator by . The backward shift operator takes a time series element and returns the previous element. . In other references it is sometimes denoted by and called the lag operator.
There is also the difference operator which is defined as . This is an aesthetically pleasing way to write .
A time series model is an autoregressive model of order p, AR(p), if
where are the parameters of the model, is a constant and is white noise. We can rewrite this as , where is the polynomial evaluated at the backward shift operator .
That means, that the next value is always a linear combination of the old value and some shock . These models are not always stationary.
Let us consider a random walk. With our new notation, we can define this as an AR(1) process where and . This is not a stationary process.
In fact, if you want to check if your process is stationary, you have to compute the roots of the polynomial . If all of them are bigger than one in absolute value the process is stationary. If there are some unit roots, i.e., roots with absolute value 1, like in the case above with the random walk, then your process is not stationary. You can check this by computing it for an AR(1) process. The proof of the general statement is just a generalization of an AR(1) model.
If you are looking for a statistical test for the presence of a unit root, there is the Augmented Dickey-Fuller test.
In the following example we will take random walk and fit an AR-process on it. Then we will check if it is stationary.
The result of running the code is as follows:
These are the estimated parameters within a 5% confidence interval.
The second row says, that our which should be one is
estimated to be between 0.98979846 and 1.00183547. Quite close.
I am not quite sure what the first pair is good for. First I thought this is an estimate for the constant term c. But it did not change when I estimated the parameters for randomwalk+100, so I simply do not know.
Let us now check if our random walk is stationary. For this, we employ the Augmented Dickey Fuller test. There are also other tests for stationarity, but for our purposes, ADF suffices.
ADF does not know about conditional heteroskedasticity, which is a mathematical way of saying that the volatilities are clustered. So if you have a heteroskedastic time series and use ADF, you are doing it wrong. ADF internally estimates the parameters assuming we have an AR(p) process and then checks for the unit root.
Our code continues from above
If we do not provide the 1, which specifies the order of our model, it estimates the order. The output is as follows:
The first value is the test statistic. Next is the p-value. The real p-values from the literature [MacKinnon (1994)] at the critical values of 1, 5, and 10%, are in the dictionary.
The null hypothesis of the ADF is that there is a unit root. Since our p-value is big, we cannot reject the null hypothesis. Alternatively we can also see that our statistic is bigger than the statistic at MacKinnons values.
This is in accord with our intuition, since we provided a nonstationary process as input, i.e., a process with a unit root.
A time series model is a moving average model of order q, MA(q), if
where are parameters, is a constant, namely the expectation of and is white noise. We can rewrite this using a polynomial as .
In MA-models, the next value of the series is a linear combination of the previous shocks. MA-models are always stationary.
We have seen the basic definitions of time series and how to test for stationarity of an AR(p) time series. I will probably reference this from future articles.