Day 9: GMM and DPD

Robert W. Walker

2025-07-31

Outline for Day 9:

Hurwicz/Nickell bias
GMM
Panels and GMM
Systems and single equations
Implementation

Brief Comment on Hurwicz/Nickell Bias

Bias is of stochastic order \frac{1}{T}.
Less bad as more T

GMM

Generalized method of moments estimators are a class of estimators created by analogs of the population moment conditions for sample moments. For example, linear regression is a GMM estimator and the moment restriction that must hold for OLS is that \mathbb{E}[\mathbf{X^{\prime}}\epsilon] = 0. With endogenous x \in \mathbf{X}, we instrument using z. If there is one z for each endogenous x, we have a standard IV. Without exact identification, we need iteration and GMM estimators will typically involve testing these overidentifying restrictions using a Sargan test, as we will see.

GMM for Panels

The trick here is that the panel structure gives us numerous instruments for free. Comes in two forms. Single-equation and systems estimators. With systems estimators, assumptions give us leverage on moment conditions in both level and difference forms, we use these jointly to estimate the parameters of interest.

Introducing DPD

We are interested in estimating the parameters of models of the form y_{it} = y_{i,t-1}\gamma + X_{it}\beta + \alpha_{i} + \epsilon_{it} for i = \{1,\ldots,N\} and t = \{1,\ldots, T \} using datasets with large N and fixed T
By construction, y_{i,t-1} is correlated with the unobserved individual-level effect \alpha_{i}.
Removing \alpha_{i} by the within transform produces an inconsistent estimator with T fixed.
First difference both sides and look for instrumental-variables (IV) and generalized method-of-moments (GMM) estimators

Arrelano-Bond

First differencing the model equation yields \Delta y_{it} = \Delta y_{i,t-1}\gamma + \Delta x_{it}\beta + \Delta \epsilon_{it}
The \alpha_{i} are gone, but the y_{i,t-1} in \Delta y_{i,t-1} is a function of the \epsilon_{i,t-1} which is also in \Delta \epsilon_{it}.
\Delta y_{i,t-1} is correlated with \Delta \epsilon_{it} by construction
Anderson and Hsiao (1981) give a 2SLS estimator based on (further) lags of \Delta y_{it} as instruments for \Delta y_{i,t-1}. E.g. if \epsilon_{it} is IID over i and t, \Delta y_{i,t-2} is valid for \Delta y_{i,t-1}.
Anderson and Hsiao (1981) also suggest a 2SLS estimator based on lagged levels of y_{it} as instruments for \Delta y_{i,t-1}. E.g. if \epsilon_{it} is IID over i and t, y_{i,t-2} can instrument for \Delta y_{i,t-1}.
Holtz-Eakin, and co-authors (1988) and Arellano and Bond (1991) showed how to construct estimators based on moment equations constructed from further lagged levels of y_{it} and the first-differenced errors.
We are creating moment conditions using lagged levels of the dependent variable with first differences, \Delta \epsilon_{it}. First-differences of strictly exogenous covariates also create moment conditions.
Assume that \epsilon_{it} are IID over i and t (no serial correlation)
GMM is needed because there are more instruments than parameters.

Strict Exogeneity vs. Predetermined

If regressors are strictly exogenous: \mathbb{E}[x_{it}\epsilon_{is}]=0\; \forall s,t.
If predetermined, \mathbb{E}[x_{it}\epsilon_{is}]\neq0\; \textrm{if} s < t but \mathbb{E}[x_{it}\epsilon_{is}]=0\; \forall s \geq t
Dynamic panel data models allow predetermined regressors. [backward feedback, no forward feedback]

A bit more on this and GMM

The moment conditions formed by assuming that particular lagged levels of the dependent variable are orthogonal to the differenced disturbances are known as GMM-type moment conditions
Sometimes they are called sequential moment conditions
The moment conditions formed using the strictly exogenous covariates are just standard IV moment conditions, so they are called standard moment conditions
The dynamic panel-data estimators in Stata report which transforms of which variables were used as instruments
In GMM estimators, we weight the vector of sample-average moment conditions by the inverse of a positive definite matrix
When that matrix is the covariance matrix of the moment conditions, we have an efficient GMM estimator
In the case of nonidentically distributed disturbances, we can use a two-step GMM estimator that estimates the covariance matrix of the moment conditions using the first-step residuals
Although the large-sample robust variance-covariance matrix of the two-step estimator does not depend on the fact that estimated residuals were used, simulation studies have found that that Windmejier’s bias-corrected estimator performs much better
Specifying vce(robust) produces an estimated VCE that is robust to heteroskedasticity

cont’d

There is a result in the large-sample theory for GMM which states that the VCE of the two-step estimator does not depend on the fact that it uses the residuals from the first step. Windmeijer 2005 bias-corrects the VCE of the two-step GMM.
No robust Sargan test but Arrelano-Bond test exists.
When the variables are predetermined, it means that we cannot include the whole vector of differences of observed xit into the instrument matrix
We just include the levels of x_{it} for those time periods that are assumed to be unrelated to \Delta \epsilon_{it}.
The Arellano-Bond estimator formed moment conditions using lagged-levels of the dependent variable and the predetermined variables with first-differences of the disturbances
Arellano and Bover(1995) and Blundell and Bond (1998) found that if the autoregressive process is too persistent, then the lagged-levels are weak instruments
These authors proposed using additional moment conditions in which lagged differences of the dependent variable are orthogonal to levels of the disturbances
To get these additional moment conditions, they assumed that panel-level effect is unrelated to the first observable first-difference of the dependent variable
\texttt{xtdpdsys} is syntactically similar to \texttt{xtabond}

Wawro

The range of panel data models employed by political scientists is a small subset of those available.
Anderson-Hsiao: (IV) Difference regression \Delta y_{i,t} = \gamma \Delta y_{i,t-1} + \beta \Delta x_{i,t-1} + \Delta u_{i,t} and instrument for terms with lagged difference or lagged levels. NB: Beck says the estimator is bad. Econometricians tend to agree.
Arrelano and Bond: Recast A-H as GMM estimator facilitating more instruments.
Systems GMM estimators that combine a levels and a difference equation.

Previous Student Question

Wawro takes the T=3 case and defines the true model to be: y_{i,t} = \gamma y_{i,t-1} + \alpha_{i} + u_{i,t}

What does this allow us to write?

\Delta_{3,2} y_{i} = \underbrace{\gamma [ \overbrace{\gamma y_{i,1} + \alpha_{i} + u_{i,2}}^{y_{i,2}} ] + \alpha_{i} + u_{i,3}}_{y_{i,3}} - \underbrace{\gamma y_{i,1} + \alpha_{i} + u_{i,2}}_{y_{i,2}} (\gamma^{2} - \gamma)y_{i,1} + \underbrace{\gamma \alpha_{i}}_{\delta_{i}} + \underbrace{u_{i,3} + (\gamma - 1) u_{i,2}}_{u_{\Delta(3,2)}^{*}}

In other words, this is an estimating equation without regressors.

So how does he get \Delta y_{i,2} = (\gamma - 1) y_{i,1} + \alpha_{i} + u_{i,2}? It relies on how we treat the initial condition.

\Delta_{2,1} y_{i} = \overbrace{\gamma y_{i,1} + \alpha_{i} + u_{i,2}}^{y_{i,2}} - y_{i,1} where the last term is \gamma y_{i,0} + \alpha_{i} + u_{i,1}. If, as is commonly the case, we treat the initial condition as fixed, we get what he gets. If we treat y_{i,1} as the fixed effect plus random noise, then we would get, after some manipulation, \Delta_{2,1} y_{i} = (\gamma - 1)y_{i,1} + (u_{i,2} - u_{i,1}) Because of the assumption of no serial correlation, the latter, under normality, is also normal (as a difference in independent normals) because the unit effects cancel under differencing.

The Data for Implementation

Contains data from abdata.dta
  obs:         1,031                          Layard & Nickell, Unemployment
                                                in Britain, Economica 53, 1986
 ---------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
ind             int    %8.0g                  industry
year            int    %8.0g
emp             float  %9.0g                  employment
wage            float  %9.0g                  real wage
cap             float  %9.0g                  gross capital stock
indoutpt        float  %9.0g                  industry output
n               float  %9.0g                  log(employment)
w               float  %9.0g                  log(real wage)
k               float  %9.0g                  log(gross capital stock)
ys              float  %9.0g                  log(industry output)
yr1980          float  %9.0g
yr1981          float  %9.0g
yr1982          float  %9.0g
yr1983          float  %9.0g
yr1984          float  %9.0g
id              float  %9.0g                  firm ID
-------------------------------------------------------------------------------
Sorted by:  id  year

Implementation

\texttt{xtregar}: , \texttt{re} and \texttt{fe} options
- Fit a first order autoregressive structure to TSCS data.
- Defaults to an iterative estimator but \texttt{twostep} is available.
- \texttt{lbi} gives a test of the hypothesis that \rho is zero. (not a default)
\texttt{xtabond}
- \texttt{estat abond} gives a test for autocorrelation
- \texttt{estat sargan} gives the overidentifying restrictions test
\texttt{xtlsdvc y x, initial(ah or ab or bb) vcov(1000 bs iter)} will handle unbalanced \ Bias-corrected least-squares dummy variable (LSDV) estimators for the standard autoregressive panel-data model using the bias approximations in Bruno (2005a) for unbalanced panels
\texttt{xtivreg}
\texttt{xtdpd} fits Arellano-Bond and Arellano-Bover/Blundell-Bond
- \texttt{estat abond} gives a test for autocorrelation
- \texttt{estat sargan} gives the overidentifying restrictions test (Rejection implies failure of assumptions)

More on DPD

David Roodman’s excellent and well documented extends the Stata command and incorporates orthogonal deviations transformation that assist in gapped panels. I personally think it is the best software for this.
Systems DPD is complicated but perhaps very useful.
As an aside, I laughed pretty hard at a post on econ job rumours where someone claimed that no one actually understands these models! [Not true, I am positive that Hansen does…..]

                     firm      year    sector         emp       wage    capital
Grand mean        73.2037 1979.6508    5.1232      7.8917    23.9188     2.5074
S.D.              41.2333    2.2161    2.6781     15.9349     5.6484     6.2487
TSS          1751193.2260 5058.2968 7387.3560 261539.3894 32861.7647 40217.7902
Between S.D.      40.5586    0.6001    2.6774     16.1689     5.1840     6.1048
BSS          1751193.2260  368.2968 7387.3560 256508.7790 28458.3312 39065.0725
Within S.D.        0.0000    2.1339    0.0000      2.2100     2.0677     1.0579
WSS                0.0000 4690.0000    0.0000   5030.6104  4403.4335  1152.7176
% Within           0.0000    0.9272    0.0000      0.0192     0.1340     0.0287
                  output
Grand mean      103.8012
S.D.              9.9380
TSS          101726.9240
Between S.D.      4.3649
BSS           19218.0111
Within S.D.       8.9502
WSS           82508.9129
% Within          0.8111

# To make it match the Stata data.
EmplUK$n <- log(EmplUK$emp)
EmplUK$w <- log(EmplUK$wage)
EmplUK$k <- log(EmplUK$capital)
EmplUK$ys <- log(EmplUK$output)

# Can just use log syntax to solve it.
# Arellano and Bond (1991), table 4(a1) 
Table4.a1 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep")
summary(Table4.a1)

Twoways effects One-step model Difference GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 
    0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 
    2:99), data = EmplUK, effect = "twoways", model = "onestep")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 611
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.6006508 -0.0299498  0.0000000 -0.0001193  0.0311461  0.5693264 

Coefficients:
                         Estimate Std. Error z-value  Pr(>|z|)    
lag(log(emp), 1:2)1      0.686226   0.144594  4.7459 2.076e-06 ***
lag(log(emp), 1:2)2     -0.085358   0.056016 -1.5238 0.1275510    
lag(log(wage), 0:1)0    -0.607821   0.178205 -3.4108 0.0006478 ***
lag(log(wage), 0:1)1     0.392623   0.167993  2.3371 0.0194319 *  
lag(log(capital), 0:2)0  0.356846   0.059020  6.0462 1.483e-09 ***
lag(log(capital), 0:2)1 -0.058001   0.073180 -0.7926 0.4280206    
lag(log(capital), 0:2)2 -0.019948   0.032713 -0.6098 0.5420065    
lag(log(output), 0:2)0   0.608506   0.172531  3.5269 0.0004204 ***
lag(log(output), 0:2)1  -0.711164   0.231716 -3.0691 0.0021469 ** 
lag(log(output), 0:2)2   0.105798   0.141202  0.7493 0.4536974    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(25) = 48.74983 (p-value = 0.0030295)
Autocorrelation test (1): normal = -3.599593 (p-value = 0.00031872)
Autocorrelation test (2): normal = -0.5160282 (p-value = 0.60583)
Wald test for coefficients: chisq(10) = 408.2859 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(6) = 11.57904 (p-value = 0.072046)

## Arellano and Bond (1991), table 4b 
Table4.b <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1)
           + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:99),
            data = EmplUK, effect = "twoways", model = "twosteps")
# To make it match Stata
summary(Table4.b, robust=FALSE)

Twoways effects Two-steps model Difference GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 
    0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 
    2:99), data = EmplUK, effect = "twoways", model = "twosteps")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 611
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.6190677 -0.0255683  0.0000000 -0.0001339  0.0332013  0.6410272 

Coefficients:
                        Estimate Std. Error  z-value  Pr(>|z|)    
lag(log(emp), 1:2)1     0.474151   0.085303   5.5584 2.722e-08 ***
lag(log(emp), 1:2)2    -0.052967   0.027284  -1.9413 0.0522200 .  
lag(log(wage), 0:1)0   -0.513205   0.049345 -10.4003 < 2.2e-16 ***
lag(log(wage), 0:1)1    0.224640   0.080063   2.8058 0.0050192 ** 
log(capital)            0.292723   0.039463   7.4177 1.191e-13 ***
lag(log(output), 0:1)0  0.609775   0.108524   5.6188 1.923e-08 ***
lag(log(output), 0:1)1 -0.446373   0.124815  -3.5763 0.0003485 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(25) = 30.11247 (p-value = 0.22011)
Autocorrelation test (1): normal = -2.427829 (p-value = 0.01519)
Autocorrelation test (2): normal = -0.3325401 (p-value = 0.73948)
Wald test for coefficients: chisq(7) = 371.9877 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(6) = 26.9045 (p-value = 0.0001509)

# Or with Robust [Notice it is default]
summary(Table4.b)

Twoways effects Two-steps model Difference GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 
    0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 
    2:99), data = EmplUK, effect = "twoways", model = "twosteps")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 611
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.6190677 -0.0255683  0.0000000 -0.0001339  0.0332013  0.6410272 

Coefficients:
                        Estimate Std. Error z-value  Pr(>|z|)    
lag(log(emp), 1:2)1     0.474151   0.185398  2.5575 0.0105437 *  
lag(log(emp), 1:2)2    -0.052967   0.051749 -1.0235 0.3060506    
lag(log(wage), 0:1)0   -0.513205   0.145565 -3.5256 0.0004225 ***
lag(log(wage), 0:1)1    0.224640   0.141950  1.5825 0.1135279    
log(capital)            0.292723   0.062627  4.6741 2.953e-06 ***
lag(log(output), 0:1)0  0.609775   0.156263  3.9022 9.530e-05 ***
lag(log(output), 0:1)1 -0.446373   0.217302 -2.0542 0.0399605 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(25) = 30.11247 (p-value = 0.22011)
Autocorrelation test (1): normal = -1.53845 (p-value = 0.12394)
Autocorrelation test (2): normal = -0.2796829 (p-value = 0.77972)
Wald test for coefficients: chisq(7) = 142.0353 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(6) = 16.97046 (p-value = 0.0093924)

## Blundell and Bond (1998) table 4
Table4.BB <- pgmm(log(emp) ~ lag(log(emp), 1)+ lag(log(wage), 0:1) +
           lag(log(capital), 0:1) | lag(log(emp), 2:99) +
           lag(log(wage), 2:99) + lag(log(capital), 2:99),        
           data = EmplUK, effect = "twoways", model = "onestep", 
           transformation = "ld") 
summary(Table4.BB, robust = TRUE)

Twoways effects One-step model System GMM 

Call:
pgmm(formula = log(emp) ~ lag(log(emp), 1) + lag(log(wage), 0:1) + 
    lag(log(capital), 0:1) | lag(log(emp), 2:99) + lag(log(wage), 
    2:99) + lag(log(capital), 2:99), data = EmplUK, effect = "twoways", 
    model = "onestep", transformation = "ld")

Unbalanced Panel: n = 140, T = 7-9, N = 1031

Number of Observations Used: 1642
Residuals:
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.7530341 -0.0369030  0.0000000  0.0002882  0.0466069  0.6001503 

Coefficients:
                         Estimate Std. Error z-value  Pr(>|z|)    
lag(log(emp), 1)         0.935605   0.026295 35.5810 < 2.2e-16 ***
lag(log(wage), 0:1)0    -0.630976   0.118054 -5.3448 9.050e-08 ***
lag(log(wage), 0:1)1     0.482620   0.136887  3.5257 0.0004224 ***
lag(log(capital), 0:1)0  0.483930   0.053867  8.9838 < 2.2e-16 ***
lag(log(capital), 0:1)1 -0.424393   0.058479 -7.2572 3.952e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sargan test: chisq(100) = 118.763 (p-value = 0.097096)
Autocorrelation test (1): normal = -4.808434 (p-value = 1.5212e-06)
Autocorrelation test (2): normal = -0.2800133 (p-value = 0.77947)
Wald test for coefficients: chisq(5) = 11174.82 (p-value = < 2.22e-16)
Wald test for time dummies: chisq(7) = 14.71138 (p-value = 0.039882)

References

The manual for R package \texttt{plm} was published in the Journal of Statistical Software. It is nice and extensive excepting the application of dpd models; only some are available. There is also panelvar for panel VAR models that estimates even more dpd estimators. Kit Baum has a very nice discussion of this in Stata in a set of course slides on the web at Boston College [search google for Baum Dynamic Panel Data Estimators].

Some Summary Remarks on Philosophy

Poolers begin with the assumption that the data can be pooled and the composite/population averaged inference is general.
Time series scholars begin with the belief that things should only be pooled with evidence in support of pooling.
These are beliefs and proclivities more than insights from general rules.
Substance and theory should drive model choices; the reverse is ridiculous given that we assume the right model.