Regressing AR(p) Models

I recently finished a course with the statistics department on Time Series Analysis, and a large component was regressing models onto the data. Today, I am looking at regressing autoregressive models. The simplest model of this type is the AR(1) model, which takes the following form, where $y_t$ is the value of the series at time $t$ and $w_t$ is a random noise component. This models how the current value $y_t$ is based on the last value $y_{t-1}$, or one time lag.

\[y_{t} = \phi y_{t-1} + w_t\]

The more general AR(p) model includes p time parameters, meaning p coefficients.

\[y_{t} = \sum_1^p\phi_i y_{t-i} + w_t\]

When looking at the AR(1) model, we can derive an expression for a least-squares type approach, with a series with N observations.

\[\begin{align*} y_2 &= \phi y_1\\ y_3 &= \phi y_2\\ &\vdots\\ y_N &= \phi y_{N-1} \end{align*}\]

So we have the following matrices, and we can get a nice closed-form solution for the AR(1) case.

\[\begin{align*} A = [y_1, y_2, \dots, y_{N-1} ]^T\\ b = [y_2, y_3, \dots, y_{N} ]^T\\ A^TA\phi = A^Tb \rightarrow \phi = \frac{\sum_{1}^{N-1}y_{i+1}y_{i}}{\sum_{1}^{N-1}y_i^2} \end{align*}\]

For the higher-order models, we can similarly derive the same least-squares expressions by the same method above. Each index of $A^TA_{i j} = \sum_p^{N-p} y_{i+p} y_{y+j}$. However, if stationarity can be assumed for this series we can make the following reasonable assumption $A^TA_{ij} = c_{\mid i-j \mid}$. This is a Toeplitz matrix and can be solved much faster and bypass the need to form the $A$ matrix and carry out the $A^TA$ matrix multiplication as well as save memory.

As an example, we can make a time series data with 1 million data points with only one real parameter $\phi = 0.9$ with a white noise variance $\sigma^2 = 1$. Since we didn’t look at the ACF or PACF, we don’t know that and regress an AR(100) model onto this data. Code at the bottom.

The Least Squares method with full matrices uses 770 MB of memory to store all of the matrices and takes 433 mSec to solve the AR(100) coefficients resulting in $\phi_1 = 0.89752$.

The Toeplitz method only takes 3.2 KB of memory to store all of the Toeplitz coefficients and takes 65.9 uSec to solve the AR(100) coefficients resulting in $\phi_1 = 0.89749$.

The mse between the two solutions for the entire $\phi$ vector is $7.73* 10^{-11}$.


from scipy.linalg import solve_toeplitz
import numpy

y = [0]
for i in range(1000000):
    y.append(0.9*y[-1] + numpy.random.randn())

y = numpy.array(y)
num_params = 100

# standard method with matrices 
A = numpy.block([y[i:i-num_params].reshape(-1,1) for i in range(num_params)])
b = y[num_params:].reshape(-1,1)

# toeplitz method
def gen_row(index):
    return y[index:index-num_params].reshape(-1,1)

toeplitz_coeffs = numpy.zeros(num_params)
for i in range(num_params):
    toeplitz_coeffs[i] = gen_row(i).T@gen_row(0)
    
new_b = numpy.zeros(num_params)
for i in range(num_params):
    new_b[i] = gen_row(i).T@b

# Time the solve 
%timeit phi = numpy.flip(numpy.linalg.solve(A.T@A, A.T@b))
%timeit phi = numpy.flip(solve_toeplitz(toeplitz_coeffs, new_b))