Dynamic Regression Models
2025-07-23
The ARIMA approach is fundamentally inductive. The workflow involves the use of empirical values of ACFs and PACFs to engage in model selection. Dynamic models engage theory/structure to impose more stringent assumptions for producing estimates.
First, a result. Aitken Theorem
In a now-classic paper, Aitken generalized the Gauss-Markov theorem to the class of Generalized Least Squares estimators. It is important to note that these are GLS and not FGLS estimators. What is the difference? The two GLS estimators considered by Stimson are not strictly speaking GLS.
Definition \hat{\beta}_{GLS} = (\mathbf{X}^{\prime}\Omega^{-1}\mathbf{X})^{-1}\mathbf{X}^{\prime}\Omega^{-1}\mathbf{y} > Properties >
> (1) GLS is unbiased.
> (2) Consistent.
> (3) Asymptotically normal.
> (4) MV(L)UE
The variance/covariance matrix of the errors for a first-order autoregressive process is useful to derive.
The matrix is banded; observations separated by one point in time are correlated \rho. Period two is \rho^2; the corners are \rho^{T-1}. The diagonal is one.
What I have actually described is the correlation; the relevant autocovariances are actually defined by \frac{\sigma^{2}\rho^{s}}{1 - \rho^2} where s denotes the time period separation.
It is also straightforward to prove (tediously through induction) that this is invertible; it is square and the determinant is non-zero having assumed that |\rho < 1|.
\Phi = \sigma^{2}\Psi = \sigma^{2}_{e} \left(\begin{array}{ccccc}1 & \rho^{1} & \rho^{2} & \ldots & \rho^{T-1} \\ \rho^1 & 1 & \rho^1 & \ldots & \rho^{T-2} \\ \rho^{2} & \rho^1 & 1 & \ldots & \rho^{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho^{T-1} & \rho^{T-2} & \rho^{T-3} & \ldots & 1 \end{array}\right)
given that e_{t} = \rho e_{t-1} + \nu_{t}. A Toeplitz form….
If the variance is stationary, we can rewrite, \sigma^{2}_{e} = \frac{\sigma^{2}_{\nu}}{1 - \rho^{2}}
A comment on characteristic roots….
We have the two key elements to implement this except that we do not know \rho; we will have to estimate it and estimates have uncertainty. But it is important to note this imposes exactly an AR(1). If the process is incorrectly specified, then the optimal properties do not follow. Indeed, the optimal properties also depend on an additional important feature.
We need to estimate things to replace unknown covariance structures and coverage will depend on properties of the estimators of these covariances.
Consistent estimators will work but there is euphemistically considerable variation in the class of consistent estimators.
Contrasting the Beck and Katz/White approach with the GLS approach is a valid difference in philosophies.1 One takes advantage of OLS and Basus Theorem; one goes full Aitken.
y_{t} = a_{1} y_{t-1} + \epsilon_{t}
is the simplest dynamic model but it cannot be estimated consistently, in general terms, in the presence of serial correlation. Why?
The key condition for unbiasedness is violated because \mathbb{E}(y_{t-1}\epsilon_{t}) \neq 0. OLS will not generally work.
A note on dynamic interpretation.
y_{t} = a_{1} y_{t-1} + \beta X_t + \epsilon_{t}
The problem is fitting and the key issue is white noise residuals post-estimation. But we have to assume a structure and implement it.
y_{t} = \alpha + \beta_{0} X_t + \beta_{1}x_{t-1} + \ldots + \epsilon_{t}
The impact of x occurs over multiple periods. It relies on theory, or perhaps analysis using information criteria/F [owing to quasi-nesting and missing data]. OLS is a fine solution to this problem but the search space of models is often large.
In response to this problem, we have structured distributed lag models; there are many such schemes.
Koyck/Geometric decay:
short run and long-run effects are parametrically identified y_t = \alpha + \beta(1-\lambda)\sum_{j=0}^{\infty}\lambda^{j}X_{t-j} + \epsilon
Almon (more arbitrary decay) y_{it} = \sum_{t_{A}=0}^{T_{F}} \rho_{t_{A}}x_{t - t_{A}} + \epsilon_{t} with coefficients that are ordinates of some general polynomial of degree T_{F} >> q. The \rho_{t_{A}} = \sum_{k=0}^{T_{F}} \gamma_{k}t^{k}.
y_{t} = \alpha + \gamma_{1}y_{t-1} + \beta_{0} X_t + \beta_{1}X_{t-1} + \beta_{2}X_{t-2} + \ldots + \epsilon_{t}
As recently as April of 2025, a paper appeared in the Journal of Politics advocating the use of ADL(2,2). The paper, by Kagalwala and Whitten, called The Answer was There All Along: Worry about the dynamics!. A previous argument was made for the ADL(1,1).
Data analysis can quite yield models comparisons among competing dynamic structures. The key issue is that the analyst need divine the process; what is the relevant error process and what is the structure and timing of effects alongside the potential question of incremental adjustment. We need good theory for that.
Given such theory, we can take an equations as analysis approach, measure the variables, and derive reduced forms, and then recover parameter estimates deploying simultaneous equations methods. Very large such systems were a core part of early empirical macroeconomics. The failures of such systems led to the proposal of alternatives.
Chris Sims suggested a more flexible approach: the VAR.
The key insight is that this VAR is the reduced form to some more complicated as yet unspecified structural form.
But if the goal is to specify how variables related to one another and to use data to discover Granger causality and responses to impulse injected in the system.
library(forecast)
mdeaths
fdeaths
save(mdeaths, fdeaths, file = "./img/LungDeaths.RData")
Series: mdeaths, fdeaths
Model: VAR(3) w/ mean
Coefficients for mdeaths:
lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2)
0.6675 0.8074 0.3677 -1.4540
s.e. 0.3550 0.8347 0.3525 0.8088
lag(mdeaths,3) lag(fdeaths,3) constant
0.2606 -1.1214 538.7817
s.e. 0.3424 0.8143 137.1047
Coefficients for fdeaths:
lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2)
0.2138 0.4563 0.0937 -0.3984
s.e. 0.1460 0.3434 0.1450 0.3328
lag(mdeaths,3) lag(fdeaths,3) constant
0.0250 -0.315 202.0027
s.e. 0.1409 0.335 56.4065
Residual covariance matrix:
mdeaths fdeaths
mdeaths 58985.95 22747.94
fdeaths 22747.94 9983.95
log likelihood = -812.35
AIC = 1660.69 AICc = 1674.37 BIC = 1700.9
Series: mdeaths, fdeaths
Model: VAR(2) w/ mean
Coefficients for mdeaths:
lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2) constant
0.9610 0.3340 0.1149 -1.3379 443.8492
s.e. 0.3409 0.8252 0.3410 0.7922 124.4608
Coefficients for fdeaths:
lag(mdeaths,1) lag(fdeaths,1) lag(mdeaths,2) lag(fdeaths,2) constant
0.3391 0.2617 -0.0601 -0.2691 145.0546
s.e. 0.1450 0.3510 0.1450 0.3369 52.9324
Residual covariance matrix:
mdeaths fdeaths
mdeaths 62599.51 24942.79
fdeaths 24942.79 11322.70
log likelihood = -833.17
AIC = 1694.35 AICc = 1701.98 BIC = 1725.83
What happens if I shock one of the series; how does it work through the system?
The idea behind an impulse-response is core to counterfactual analysis with time series. What does our future world look like and what predictions arise from it and the model we have deployed?
Whether VARs or dynamic linear models or ADL models, these are key to interpreting a model in the real world.
ESSSSDA25-2W: Heterogeneity and Dynamics