Day 6: Panel Models

class: center, middle, inverse, title-slide

.title[
# Day 6: Panel Models
]
.subtitle[
## Fixed and Random and Some of Both
]
.author[
### Robert W. Walker
]
.date[
### August 15, 2022
]

---

## Day 6: Models for Heterogeneity

- With models, we need a model for comparison.
- Some Useful Notation
- Fixed and Random Effects
- Comparing FE and RE with Hausman
- The Multilevel generalization (Bell and Jones and others)
- Backfilling details

---

## The Dimensions of TSCS Summary

- Presence of a time dimensions gives us a natural ordering.
- Space is not irrelevant under the same circumstances as time -- nominal indices are irrelevant on some level.  Defining space is hard.  Ex. targeting of Foreign Direct Investment and defining proximity.
- ANOVA is informative in this two-dimensional setting.
- A part of any good data analysis is summary and characterization.  The same is true here; let's look at some examples of summary in panel data settings.

---
## Basic `xt` commands

In Statas language, `$\texttt{xt}$` is the way that one naturally refers to CSTS/TSCS data.  Consider `$NT$` observations on some random variable `$y_{it}$` where `$i \in N$` and `$t \in T$`.  The TSCS/CSTS commands almost always have this prefix. \begin{itemize}
- `$\texttt{xtset}$`: Declaring \texttt{xt} data
- `$\texttt{xtdes}$`: Describing \texttt{xt} data structure
- `$\texttt{xtsum}$`: Summarizing \texttt{xt} data
- `$\texttt{xttab}$`: Summarizing categorical \texttt{xt} data.
- `$\texttt{xttrans}$`: Transition matrix for \texttt{xt} data.
- `$\texttt{xtline}$`: Line graphs for \texttt{xt} data.

---

## A Primitive Question

Given two-dimensional data, how should we break it down?  The most common method is unit-averages; we break each unit's time series on each element into deviations from their own mean.  This is called the within transform.  The between portion represents deviations between the unit's mean and the overall mean.  Stationarity considerations are generically implicit.

---

## Some Useful Variances and Notation

- W(ithin) for unit `$i$` [NB: Thus the total within variance would be a summary over all `$i \in N$`]: `$$W_{i} = \sum_{t=1}^{T} (x_{it} - \overline{x}_{i})^{2}$$`
- B(etween): `$$B_{T} = \sum_{i=1}^{N}  (\overline{x}_{i} - \overline{x})^{2}$$`
- T(otal): `$$T = \sum_{i=1}^{N} \sum_{t=1}^{T} (x_{it} - \overline{x})^{2}$$`

---

## Some Useful Notation

- The Kronecker Product `$\otimes$`:  This is a simple way of condensing the notation for sets of matrices.  It is important to note that conformity is not required.  So, for a general matrix `$A_{kl}$` and `$B_{mn}$`, we can write `$$A \otimes B =\left(\begin{array}{ccc}a_{11}B & a_{12}B & a_{13}B \\a_{21}B & a_{22}B & a_{23}B \\a_{31}B & a_{32}B & a_{33}B\end{array}\right)$$` with a result `$C$` of dimension `$(km)(ln)$`.
- The inverse of a Kronecker product is well defined [under invertibility conditions] `$$[A \otimes B]^{-1} = [A^{-1} \otimes B^{-1}]$$`
- As are products of Kronecker products `$$(A\otimes B)(C\otimes D) = AC\otimes BD$$`

---

## Why is the Notation Useful?

Let `$A$` be a variance/covariance matrix across panels and `$B$` be the same matrix for a given panel.  This is a fairly general way to conceive of a panel data problem.

- Heteroscedasticity?
- Temporal Autocorrelation?
- Spatial Autocorrelation?

---

## Heteroscedasticity

The homoscedastic case is described by `$\sigma^2 I$`.

The [unit] heteroscedastic case is described, generally, by `$\sigma^{2}_{i} I_{N} \otimes I_{t}$`
`$$\left(\begin{array}{cccc}\sigma^{2}_{1}I_{T} & 0 & 0 & 0 \\ 0 & \sigma^{2}_{2}I_{T} & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \sigma^{2}_{N}I_{T} \end{array}\right)$$`
The ultimate result will be of dimension `$NT\times NT$`.  The first `$T$` entries will be `$\sigma^{2}_{1}$`, entries `$T+1$` to `$2T$` will be `$\sigma^{2}_{2} \ldots$`  and the entries `$(N-1)T + 1$` to `$NT$` will be `$\sigma^{2}_{N}$`.  If we believed that the heteroscedasticity arose from time points rather than units, replace `$N$` with `$T$` and vice versa; `$i$` becomes `$t$`.

---

## The Managable Autocorrelation Structure

`$$\Phi = \sigma^{2}\Psi = \sigma^{2}_{e}  \left(\begin{array}{ccccc}1 & \rho_{1} & \rho_{2} & \ldots & \rho_{T-1} \\ \rho_1 & 1 & \rho_1 & \ldots & \rho_{T-2} \\ \rho_{2} & \rho_1 & 1 & \ldots & \rho_{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho_{T-1} & \rho_{T-2} & \rho_{T-3} & \ldots & 1 \end{array}\right)$$`

given that `$e_{t} = \rho e_{t-1} + \nu_{t}$`.  A Toeplitz form....

This allows us to calculate the variance of `$e$` using results from basic statistics, i.e. `$Var(e_{t}) = \rho^{2}Var(e_{t-1}) + Var(\nu)$`.  If the variance is stationary, we can rewrite,

`$$\sigma^{2}_{e} = \frac{\sigma^{2}_{\nu}}{1 - \rho^{2}}$$`

---

## Autocorrelation

When discussing heteroscedasticity, we notice that the off-diagonal elements are all zeroes.  This is the assumption of no correlation among [somehow] adjacent elements.  The somehow takes two forms: (1) spatial and (2) temporal.  Just as before where time-induced heteroscedasticity simply involved interchanging `$N$` and `$T$` and `$i$` and `$t$`; the same idea prevails here.

---

## Aitken's Theorem?

In a now-classic paper, Aitken generalized the Gauss-Markov theorem to the class of Generalized Least Squares estimators.  It is important to note that these are GLS and not FGLS estimators.  What is the difference?  The two GLS estimators considered by Stimson are not strictly speaking GLS.

Definition: `$$\hat{\beta}_{GLS}  = (\mathbf{X}^{\prime}\Omega^{-1}\mathbf{X})^{-1}\mathbf{X}^{\prime}\Omega^{-1}\mathbf{y}$$`

Properties:

1. GLS is unbiased.  
2. Consistent.  
3. Asymptotically normal.  
4. MV(L)UE

---

## What does the feasible do?

We need to estimate things to replace unknown covariance structures
and coverage will depend on properties of the estimators of these
covariances.  Consistent estimators will work but there is
euphemistically **considerable variation** in the class of consistent
estimators.  Contrasting the Beck and Katz/White approach with the GLS
approach is a valid difference in philosophies. NB: We will return to
this when we look at Hausman because this is the essential issue.

---

## The Beck and Katz solution

Beck and Katz take a different tack to the general data types in common use (long `$T$`).  The basic idea is to generate estimates using OLS because GLS can be quite bad. .red[What do we need to be able to do this?]

- Locate a specification to purge serial correlation (in `$t$`).
- [p. 638] Construct the panel corrected standard error.  Construct `$\Sigma$` ($N \times N$) using `$$\hat{\Sigma} = \frac{\sum_{t=1}^{T} e_{it} e_{jt} } {T}.$$`  Estimate the cross-sectional correlation matrix.  Kronecker product this in `$\mathbf{I}_{T}$` remembering how we got `$\mathbf{I}$`.
- Inference with OLS and PCSE in the spirit of White, really Huber (1967) but the key is separable moments.  Brief diversion here about separability; it turns out the result yesterday is what gives rise to the appropriate intuition.

---

## Thinking about `$\texttt{robust}$` and `$\texttt{cluster}$`

Every `$\texttt{Stata}$` user is familiar with this, it seems.  Though not developed by Stata (but Hardin, a student of Huber), the two are synonymous.  What would these look like in an application?

- just `$\texttt{robust}$` is unstructured heteroscedastic
- `$\texttt{cluster}$` utilizes the multidimensional axes

---

## `$\texttt{xtgls}$` and `$\texttt{xtpcse}$`

Two significant options of note

1. `$\texttt{panels(iid,heteroscedastic,correlated)}$`
1. `$\texttt{correlation(ar1,psar1,independent)}$`

---

### panels

- `$\texttt{iid}$`
`$$\epsilon\epsilon^{\prime} =  \sigma^{2}\mathbf{I}_{N\times N}$$`
gives us homscedasticity and no spatial correlation; `$\sigma^{2}$` is scalar.
- `$\texttt{heteroscedastic}$`
`$$\epsilon\epsilon^{\prime} =  \sigma_{i}^{2}\mathbf{I}_{N\times N}$$`
 gives us heteroscedasticity and no spatial correlation; `$\sigma^{2}_{i}$` is an `$N$`-vector.
- `$\texttt{correlated}$`
`$$\epsilon\epsilon^{\prime} =   \left(\begin{array}{ccccc}\sigma^{2}_{1} & \sigma_{12} & \sigma_{13} & \ldots & \sigma_{1N} \\ \sigma_{21} & \sigma^{2}_{2} & \sigma_{23} & \ldots & \sigma_{2N} \\ \sigma_{31} & \sigma_{32} & \sigma^{2}_{3} & \ldots & \sigma_{3N} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \sigma_{N1} & \sigma_{N2} & \sigma_{N3} & \ldots & \sigma^{2}_{N} \end{array}\right)$$`
gives us heteroscedastic and (contemporaneously) spatially correlated errors

---

### correlation

- `$\texttt{independent}$` gives us no autoregression `$$\epsilon\epsilon^{\prime} =  \mathbf{I}_{T\times T}$$` .
- `$\texttt{ar1}$` gives us a global autoregressive parameter for the errors.  In simple terms, all cross-sections share the same **level** of serial correlation. `$$\epsilon\epsilon^{\prime} =   \left(\begin{array}{ccccc}1 & \rho & \rho^{2} & \ldots & \rho^{T-1} \\ \rho & 1 & \rho & \ldots & \rho^{T-2} \\ \rho^{2} & \rho & 1 & \ldots & \rho^{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho^{T-1} & \rho^{T-2} & \rho^{T-3} & \ldots & 1 \end{array}\right)$$`
- `$\texttt{psar1}$` gives us an autoregressive parameter for the errors that is unique to each cross-section.  Each cross-section has a distinct **level** of serial correlation. `$$\epsilon\epsilon^{\prime} =   \left(\begin{array}{ccccc}1 & \rho_{i} & \rho_{i}^{2} & \ldots & \rho_{i}^{T-1} \\ \rho_{i} & 1 & \rho_{i} & \ldots & \rho_{i}^{T-2} \\ \rho_{i}^{2} & \rho_{i} & 1 & \ldots & \rho_{i}^{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho_{i}^{T-1} & \rho_{i}^{T-2} & \rho_{i}^{T-3} & \ldots & 1 \end{array}\right)$$`

---

## Unit Heterogeneity

Most discussions of panel data estimators draw on a **fixed** versus **random** effects distinction.  The subtle distinction is important but perhaps overstated.

---

## Definitions

Let's construct a general model:
`$$y_{it} = \alpha_{it} + X_{it}\beta_{it} + \epsilon_{it}$$`

- Pooled Model: `$y_{it} = \alpha + X_{it}\beta + \epsilon_{it}$`
- Year Dummies Model: `$y_{it} = \alpha_{t} + X_{it}\beta + \epsilon_{it}$`
- (Two-way) LSDV: `$y_{it} = \alpha_{i} + \alpha_{t} + X_{it}\beta + \epsilon_{it}$`
- Unit Dummies Model: `$y_{it} = \alpha_{i} + X_{it}\beta + \epsilon_{it}$`

1. Fixed effects: `$y_{it} - \bar{y}_{i} = \Delta_{i}X_{it}\beta + \Delta_{i}\epsilon_{it}$`
1. Random effects `$\alpha_{i} \bot X_{it}$`:

`$\alpha_{i} \sim [ \alpha , \sigma^{2}_{\alpha} ]$`

`$\epsilon_{it} \sim [0 , \sigma^{2}_{\epsilon}]$`

---

## Why does heterogeneity matter?

- If `$\alpha \neq \alpha_{i} \forall i$`, then serial correlation is induced in the errors.  At a minimum, this implies incorrect standard errors for inference and inefficiency.
- If `$\mathbb{E}[X_{it}\alpha_{i}] \neq 0$`, then `$(\alpha_{i} - \alpha)$` is an omitted variable with a consequent bias induced.  We can draw a picture of this.

A brief simulation.

---

## Some ANCOVA

- Pooled Slope and Intercepts
- Pooled Intercepts
- Pooled Slopes

---

## Constructing Estimators

- Pooled Estimator
`$$\hat{\beta}_{T} = T_{xx}^{-1}T_{xy} = (X^{\prime}X)^{-1}X^{\prime}y$$`
- Within Estimator
`$$\hat{\beta}_{W} = W_{xx}^{-1}W_{xy}$$`
- Between Estimator
`$$\hat{\beta}_{B} = B_{\overline{x}\overline{x}}^{-1}B_{\overline{x}\overline{y}}$$`

---

## A Variation Identity

- `$T = W_{xx} + W_{\overline{x}\overline{x}}$`
- In different notation, `$T = W + B$` or `$S^{t} = S^{w} + S^{b}$`.

`$$\sum_{i=1}^{N} \sum_{t=1}^{T} (x_{it} - \overline{x})^{2}  =  \sum_{i=1}^{N} \sum_{t=1}^{T} x_{it}^{2} - NT\overline{x}^{2}$$`

`$$\sum_{i=1}^{N} \sum_{t=1}^{T} x_{it}^{2} - T_{i} \overline{x}_{i} + T_{i}\overline{x}_{i} -  NT\overline{x}^{2}$$`

`$$\sum_{i=1}^{N} \underbrace{\sum_{t=1}^{T} (x_{it} - \overline{x}_{i})^{2}}_{W_{i}} +  \underbrace{\sum_{i=1}^{N} T_{i}(\overline{x}_{i} -  \overline{x})^{2}}_{B_{T}}$$`

---

## Back to ANCOVA

1. RSS from `$W_{i}$` with DF = `$NT - NK - N$`
2. RSS from `$W$` with DF = `$NT - N - K$`
3. RSS from `$T$` with DF = `$NT - K - 1$`

For total pooling, we can F-test `$\frac{3 - 1}{1}$`.  This is the least and most restricted sets of models.  If we can reject this, pooling is (perhaps) justified?  Now let's construct some others.  Suppose we reject total pooling.  Is it intercepts, slopes, or both?  Imposing a slope restriction gives us 2, the `$F$` we want is `$\frac{2-1}{1}$`.  What do we get from `$\frac{3-2}{2}$`?  NB: It's conditional.  We can also do this with time.  This is a good starting point, but it is not as clean as we might like.

---
class: left, inverse

## OLS as Weighted Average

`$$\hat{\beta}_{OLS}  =  [S^{t}_{xx}]^{-1} S^{t}_{xy}$$`
`$$\hat{\beta}_{OLS} = [S^{w}_{xx} + S^{b}_{xx}]^{-1} (S^{w}_{xy} + S^{b}_{xy})$$`
`$$\hat{\beta}_{OLS} =  [S^{w}_{xx} + S^{b}_{xx}]^{-1}S^{w}_{xy} + [S^{w}_{xx} + S^{b}_{xx}]^{-1}S^{b}_{xy}$$`

Let `$F^{w} = [S^{w}_{xx} + S^{b}_{xx}]^{-1}S^{w}_{xx} \rightarrow F^{b} = I - F^{w} = [S^{w}_{xx} + S^{b}_{xx}]^{-1}S^{b}_{xx}$`.

My claim is that  `$\hat{\beta}_{OLS} =  F^{w}\beta^{w} + F^{b}\beta^{b}$`.

`$$\hat{\beta}_{OLS} =  [S^{w}_{xx} + S^{b}_{xx}]^{-1}\underbrace{S^{w}_{xx}[S_{xx}^{w}]^{-1}}_{I}S_{xy}^{w} + [S^{w}_{xx} + S^{b}_{xx}]^{-1}\underbrace{S^{b}_{xx}[S_{xx}^{b}]^{-1}}_{I}S_{xy}^{b}$$` 
`$$\hat{\beta}_{OLS} =  [S^{w}_{xx} + S^{b}_{xx}]^{-1}S_{xy}^{w} + [S^{w}_{xx} + S^{b}_{xx}]^{-1}S_{xy}^{b}$$`

---

## A Random Effects Estimator

- Assume that the unit means have some distribution rather than being some fixed constant.
- This allows (under normality) us to partition the global error into components.
- The method is the same, the difference is the weighting by a covariance matrix with a known structure.
- As we noted, there is a simple problem with the application of the OLS estimator if the error is correlated with the regressors.
- How might we think about remedying this?

---

## Comparing Fixed and Random Effects

- The Hausman test: smart and broadly applicable idea.  Wish it worked better...  See V. E. Troeger.
- Mundlak's argument merits consideration.
- Pluemper and Troeger's idea is clever.

---

## Hausman's Idea

The basic idea is that the fixed effects estimator is consistent but potentially inefficient.  The random effects estimator is only consistent under the null.  We can leverage this to form a test in the Hausman family using the result proved in the paper.  This is implemented in Stata using model storage capabilities.

- Estimate a consistent model
- Store the result as XXX.
- Estimate an efficient model
- Store the result as YYY.
- `$\texttt{hausman}$` XXX YYY

---

## Mundlak

The basic idea behind Mundlak's paper is that the fixed versus random effects debate is ill conceived.  Moreover, there is a **right model**.  Why and how?

- Conditional versus unconditional inference.
- FE problem is inefficiency.
- RE problem can be bias.
- Maybe we want an MSE criterion?
- As usual, `$N$` and `$T$` matter in size.  Plug-in estimators in general.

---

## Bell, Fairbrother, and Jones

Estimate a variant of the Mundlak model that accommodates all the concerns.

`$$y_{it} = \beta_{0} + \beta_{1W}(x_{it} - \overline{x}_{i}) + \beta_{2B}\overline{x}_{i} + \beta_{3}z_{i} + ( \nu_{i} + \epsilon_{it})$$`

---

## First-Differences

Define `$\Delta$` to be a difference operator so that we can define `$$\Delta X  =  X_{it} - X_{i,t-1}$$`

`$$\Delta y  =  y_{it} - y_{i,t-1}$$`

Observation: N(T-1) observations if `$T_{i} \geq 2\;\;\; \forall i$`.  Equality case is interesting.
The first-difference estimator is then: 
`$$\Delta y = \beta(\Delta X) + \epsilon_{it}$$`

And an OLS estimator would simply look like:
`$$\hat{\beta} = (\Delta X^{\prime}\Delta X{})^{-1} (\Delta X^{\prime} \Delta y)$$`
NB: For `$T=2$` show that FE is FD.

---
  
## First Differences/Fixed Effects
  
Either transformation removes heterogeneity.  The difference is that the two estimators operate at different orders of integration.  The difference is not purely convenience; there is substance to this and theory can help.  At the same time, the statistics matter.

---
  
## Specification Testing and Interpretation in the Fixed Effects Model
  
- F-test of the dummy variables.  **What does this mean?**
- Above can be done in one- and two- way frameworks.
- The substance depends on the first-order question.  Under what conditions are first-order effects unbiased (we know this)?  The RE/GLS approach works when the orthogonality is maintained.
- Example from Arellano, p. 40

---
  
## Conditional versus unconditional prediction?
  
The fixed effect model is entirely conditional on the sample.  If we do not know a unit fixed effect, the predictions are undefined.  The random effects model can sample from the distribution of random effects.

---
  
## Stata Implementation
  
- `$\texttt{xtreg}$`: contains five estimators.  For now, we will skip ($\texttt{pa}$).

- `$\texttt{be}$`: the between effects estimator. `$$\overline{y}_{i} = \overline{x}_{i} + \epsilon_{i}$$`
- `$\texttt{fe}$`: the fixed effects or within estimator. `$$y^{C_{i}} = \mathbf{X}^{C_{i}} + \epsilon_{it}$$`
    - `$\texttt{re}$`: the standard GLS random effects estimator.
- `$\texttt{mle}$`: the maximum likelihood random effects estimator.
  
  
---
    
## Random Effects in Estimation
    
    
- The between estimator ignores all within variation  ($\psi=0$).  
- OLS is a weighted average of between and within ($\psi=1$). 
- GLS is an optimally determined compromise given the orthogonality assumption ($0 \geq \psi \geq 1$).
  
That weight is not in any sense optimally determined, it is a function of the relative ratio of the two quantities (all variance counts the same).  As Hsiao (p. 37) points out that the random effects estimator is often known as a quasi-demeaning estimator.  This is because it is a partial within transformation.  
  
---
    
## Details on Random Effects GLS (FGLS)
    
We will start with the model we defined as random effects before.  We defined random effects `$\alpha_{i} \bot X_{it}$`: `$\alpha_{i} \sim [\alpha , \sigma^{2}_{\alpha}] \; \; \epsilon_{it} \sim [0 , \sigma^{2}_{\epsilon}]$`.  Consider `$\nu_{it} = \alpha_{i} + \epsilon_{it}$`.  
  
For a single cross-section (remembering the Kronecker product will help us here)  `$$\mathbb{E}(\nu_{it}\nu_{it}^{\prime}) = \sigma^{2}_{\epsilon}\mathbf{I_{T}} + \sigma_{\alpha}^{2}\mathbf{1}_{T} = \Omega$$`  The inverse is given by `$$\Omega^{-1} = \frac{1}{\sigma^{2}_{\epsilon}}\left[\mathbf{I_{T}} - \frac{\sigma^{2}_{\alpha}}{\sigma_{\epsilon}^{2} + T\sigma^{2}_{\alpha}}\mathbf{1}_{T} \right]$$`
    
---
    
We can also estimate this by using ordinary least squares applied to transformed data.  The quasi-demeaning can be done in a first-stage with OLS estimates on the quasi-demeaned data.  Recall the pooled regression uses no transformation.  The within estimator uses complete demeaning.  The random effects estimator is somewhere in between.
  
---
    
## Random Effects Variance
    
Breusch and Pagan (modified by Baltagi and Li) have developed a Lagrange multiplier test of whether or not the random effects have a variance.  The test statistic is defined as:
    
`$$LM = \frac{NT}{2(T-1)}\left[\frac{\sum_{N} \left( \sum_{T} \epsilon_{it} \right)^{2} } {\sum_{N} \sum_{T} \epsilon_{it}^{2} } - 1 \right] \sim \chi^{2}_{1}$$`
    
---
    
```
. xtreg growth lagg opengdp openex openimp leftc central inter, re
Random-effects GLS regression                   Number of obs      =       240
Group variable (i): country                     Number of groups   =        16
  -----------------------------------------------------------------------------
 growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+---------------------------------------------------------------
  lagg1 |    .151848   .0865508     1.75   0.079    -.0177884    .3214843
opengdp |   .0082889   .0010012     8.28   0.000     .0063267    .0102511
 openex |   .0019834   .0005903     3.36   0.001     .0008263    .0031404
openimp |  -.0047988   .0010474    -4.58   0.000    -.0068518   -.0027459
  leftc |  -.0268801   .0108211    -2.48   0.013     -.048089   -.0056711
central |  -.7428119   .2547157    -2.92   0.004    -1.242045   -.2435784
  inter |   .0138935   .0041671     3.33   0.001     .0057261    .0220609
  _cons |   3.607517    .571187     6.32   0.000     2.488011    4.727023
  -------------+---------------------------------------------------------------
sigma_u |  .36517121
sigma_e |  2.0094449
    rho |  .03196908   (fraction of variance due to u_i)
  -----------------------------------------------------------------------------
```

---

## R-squareds
    
```
. xtreg growth lagg1 opengdp, fe
  
Fixed-effects (within) regression               Number of obs      =       240
Group variable (i): country                     Number of groups   =        16
  
R-sq:  within  = 0.2562                         Obs per group: min =        15
       between = 0.0031                                        avg =      15.0
      overall = 0.1563                                        max =        15
  
                                                F(2,222)           =     38.23
corr(u_i, Xb)  = -0.3888                        Prob > F           =    0.0000
  
------------------------------------------------------------------------------
    growth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |   .2647972   .0851979     3.11   0.002     .0968971    .4326972
  opengdp |   .0094949   .0011229     8.46   0.000      .007282    .0117078
    _cons |   .5289261   .3719065     1.42   0.156    -.2039929    1.261845
-------------+----------------------------------------------------------------
  sigma_u |   1.142546
  sigma_e |  2.0889953
      rho |  .23025918   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(15, 222) =     3.55             Prob > F = 0.0000
```

---

```
. reg Cgrowth Clagg1 Copengdp
  
  Source |       SS       df       MS              Number of obs =     240
-------------+------------------------------       F(  2,   237) =   40.81
   Model |  333.650655     2  166.825327           Prob > F      =  0.0000
Residual |  968.786108   237   4.0877051           R-squared     =  0.2562
-------------+------------------------------       Adj R-squared =  0.2499
   Total |  1302.43676   239   5.4495262           Root MSE      =  2.0218
  
-----------------------------------------------------------------------------
 Cgrowth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
  -------------+---------------------------------------------------------------
  Clagg1 |   .2647972   .0824577     3.21   0.002     .1023536    .4272408
Copengdp |   .0094949   .0010868     8.74   0.000     .0073539    .0116359
   _cons |   1.30e-08   .1305071     0.00   1.000    -.2571021    .2571021
------------------------------------------------------------------------------
```

---
    
## Betweens
    
```
. by country: egen gmean = mean(growth)
. by country: egen glmean = mean(lagg1)
. by country: egen opengdpmean = mean(opengdp)
. gen yhatb = _b[_cons] + _b[lagg1]*glmean + _b[opengdp]*opengdpmean
. reg gmean yhatb
  
Source |       SS       df       MS              Number of obs =     240
-------------+------------------------------           F(  1,   238) =    0.75
Model |  .445360906     1  .445360906           Prob > F      =  0.3868
Residual |  140.975583   238  .592334381           R-squared     =  0.0031
-------------+------------------------------           Adj R-squared = -0.0010
Total |  141.420943   239  .591719429           Root MSE      =  .76963
  
------------------------------------------------------------------------------
  gmean |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  yhatb |  -.0570801   .0658282    -0.87   0.387    -.1867605    .0726003
  _cons |   3.185291   .2044862    15.58   0.000     2.782457    3.588125
------------------------------------------------------------------------------
```
  
---
    
## Total
    
```
gen yhatT = _b[_cons] + _b[lagg1]*lagg1 + _b[opengdp]*opengdp
  
. fit growth yhatT
  
  Source |       SS       df       MS              Number of obs =     240
-------------+------------------------------           F(  1,   238) =   44.11
   Model |  225.744206     1  225.744206           Prob > F      =  0.0000
Residual |  1218.11349   238  5.11812392           R-squared     =  0.1563
-------------+------------------------------           Adj R-squared =  0.1528
   Total |   1443.8577   239   6.0412456           Root MSE      =  2.2623
  
------------------------------------------------------------------------------
  growth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  yhatT |   .6927893   .1043154     6.64   0.000       .48729    .8982887
  _cons |   .9257153   .3465985     2.67   0.008     .2429227    1.608508
------------------------------------------------------------------------------
```
  
Extending this basic logic will hold for all \texttt{xtreg} estimators.  Basically, think about them as projecting any given model result to the centered data, to group means, and to all data.
  
---
    
## Random Coefficients
    
We saw fixed and random effects.  The basic idea generalizes to regression coefficients on variables that are not unit-specific factors/indicators.
  
- Random Coefficients Specifications (Swamy 1970)
  
`$$y_{it}  =  \alpha +  (\overline{\beta} + \mu_{i})X_{it} + \epsilon_{it}$$`
`$$\mathbb{E}[\alpha_{i}]  =  0; \mathbb{E}[\alpha_{i} X_{it}]=0$$`
`$$\mathbb{E}[\alpha_{i}\alpha_{j}]  =  
    \begin{cases}
  \Delta & \text{if } i = j \\
  0 & \text{if } i \neq j
  \end{cases}$$`
    
Hsiao and Pesaran (2004, IZA DP 136) show that the GLS estimator is a matrix weighted average of the OLS estimator applied to each unit separately with weights inversely proportional to the covariance matrix for the unit.
  
---
    
## `$\texttt{xtrc}$`: Implementing Random Coefficients
    
`$\texttt{xtrc}$` estimates the Swamy random coefficients model and provides us with a test statistic of parameter constancy.  If the statistic is significantly different from zero, parameter constancy is rejected.  Option `$\texttt{betas}$` gives us the unit-specifics.  We have `$\texttt{vce}$` options here also. 
  
Note, as with many `$\texttt{xt}$` commands, the jackknife is unit-based.
  
---
    
## `$\texttt{xtmixed}$`
    
Stata has a mixed effects module that we can use for some things we have already seen and for extensions.  I should say in passing that this also works for dimensions with nesting properties, though we are looking at two-dimensional data structures.
  
```
. sum
  
 Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     year |       240        1977    4.329523       1970       1984
  country |       240         8.5    4.619406          1         16
   growth |       240    3.013292    2.457895       -3.6        9.8
    lagg1 |       240    3.119855    1.652682   -2.40641   6.683519
  opengdp |       240    174.6452    146.2456      -32.1     736.02
 ----------+--------------------------------------------------------
   openex |       240    489.7662    420.4374      30.94     2879.2
  openimp |       240    482.8254    267.6722      64.96     1415.2
    leftc |       240    34.79583    39.56008          0        100
  central |       240     2.02421    .9593759   .4054115   3.618419
    inter |       240    91.33376    117.5622          0   361.8419
```
  
---
    
```
.  xtreg growth lagg1 opengdp openimp openex leftc, re

Random-effects GLS regression                   Number of obs      =       240
Group variable (i): country                     Number of groups   =        16
  
R-sq:  within  = 0.2960                         Obs per group: min =        15
       between = 0.2038                                        avg =      15.0
       overall = 0.2811                                        max =        15
  
Random effects u_i ~ Gaussian                   Wald chi2(5)       =     92.41
corr(u_i, X)       = 0 (assumed)                Prob > chi2        =    0.0000
  
------------------------------------------------------------------------------
  growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   lagg1 |   .2194248   .0875581     2.51   0.012     .0478142    .3910355
 opengdp |   .0077965   .0009824     7.94   0.000      .005871    .0097219
 openimp |  -.0053695   .0009868    -5.44   0.000    -.0073035   -.0034355
  openex |   .0019647   .0006047     3.25   0.001     .0007796    .0031498
   leftc |   .0030365   .0036142     0.84   0.401    -.0040472    .0101202
   _cons |   2.491734   .4633904     5.38   0.000     1.583505    3.399962
-------------+----------------------------------------------------------------
  sigma_u |  .21759529
  sigma_e |  2.0364407
      rho |  .01128821   (fraction of variance due to u_i)
------------------------------------------------------------------------------
```
  
---
    
## An MLE
    
```
.  xtreg growth lagg1 opengdp openimp openex leftc, mle
Random-effects ML regression                    Number of obs      =       240
Group variable (i): country                     Number of groups   =        16

Random effects u_i ~ Gaussian                   Obs per group: min =        15
  
LR chi2(5)         =     81.33
Log likelihood  =  -514.4714                    Prob > chi2        =    0.0000
  
------------------------------------------------------------------------------
  growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |   .1873509   .0881362     2.13   0.034      .014607    .3600947
  opengdp |   .0077706   .0009913     7.84   0.000     .0058276    .0097136
  openimp |  -.0055243   .0010506    -5.26   0.000    -.0075835   -.0034651
   openex |   .0020447   .0005936     3.44   0.001     .0008812    .0032082
    leftc |   .0044378   .0039745     1.12   0.264    -.0033521    .0122277
    _cons |   2.583146   .5204807     4.96   0.000     1.563022    3.603269
  -------------+----------------------------------------------------------------
  /sigma_u |   .5100119   .1962033                      .2399497    1.084028
  /sigma_e |   2.018389   .0957214                      1.839233    2.214995
       rho |   .0600166   .0445522                      .0110832    .2056057
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)=    3.56 Prob>=chibar2 = 0.030
```
  
---
    
```
. xtmixed growth lagg1 opengdp openimp openex leftc || R.country, mle
  Mixed-effects ML regression                     Number of obs      =       240
  Group variable: _all                            Number of groups   =         1
  Wald chi2(5)       =     97.44
  Log likelihood =  -514.4714                     Prob > chi2        =    0.0000
  
------------------------------------------------------------------------------
    growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |   .1873501   .0859494     2.18   0.029     .0188925    .3558078
  opengdp |   .0077706   .0009911     7.84   0.000     .0058281     .009713
  openimp |  -.0055243   .0010452    -5.29   0.000    -.0075729   -.0034757
   openex |   .0020447   .0005915     3.46   0.001     .0008854    .0032039
    leftc |   .0044378   .0038479     1.15   0.249     -.003104    .0119796
    _cons |   2.583148   .5173579     4.99   0.000     1.569145    3.597151
------------------------------------------------------------------------------
    
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
  _all: Identity               |
  sd(R.country) |   .5100191   .1962046      .2399545    1.084037
-----------------------------+------------------------------------------------
   sd(Residual) |   2.018388   .0957229       1.83923    2.214997
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =     3.56 Prob >= chibar2 = 0.0296
```
  
---
    
## General Stata things, `$\texttt{, vce()}$`
    
For virtually all Stata commands, we can acquire multiple variance/covariance matrices of the parameters.
  
- `$\texttt{, robust}$` sometimes
- `$\texttt{, cluster()}$` sometimes
- `$\texttt{, vce(boot)}$`
- `$\texttt{, vce(jack)}$`
    
    
---
    
## `$\texttt{xtmixed}$`
    
Will allow us to do tons of things.  In particular, we can play with the residual correlation matrix using the option `$\texttt{residuals}$`.  One can recreate virtually everything that we have seen so far this way.  The remaining task for you in the lab is to figure out what all you can make it do.
  
- exchangeable
- ar
- ma
- unstructured
- banded
- toeplitz
- exponential
  
---
    
## Mixed Effects Models in Stata with `$\texttt{xtmixed}$`
    
Mixed effects models will allow us to estimate many interesting models for \texttt{xt} data.
  
- Simple random effects
- Crossed random effects
- Random Coefficients
- Determined random coefficients
  
---
    
## Examples
    
For the simple random effects estimator, there are two ways to do it via ML.
  
- `$\texttt{xtreg depvar indvars, mle}$`
- `$\texttt{xtmixed depvar indvars || \_all: R.UnitID, mle}$`
    
    
---
    
```
. xtreg growth lagg1 opengdp openimp openex leftc, mle
LR chi2(5)         =     81.33
Log likelihood  =  -514.4714                    Prob > chi2        =    0.0000
  
------------------------------------------------------------------------------
  growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |   .1873509   .0881362     2.13   0.034      .014607    .3600947
  opengdp |   .0077706   .0009913     7.84   0.000     .0058276    .0097136
  openimp |  -.0055243   .0010506    -5.26   0.000    -.0075835   -.0034651
   openex |   .0020447   .0005936     3.44   0.001     .0008812    .0032082
    leftc |   .0044378   .0039745     1.12   0.264    -.0033521    .0122277
    _cons |   2.583146   .5204807     4.96   0.000     1.563022    3.603269
-------------+----------------------------------------------------------------
  /sigma_u |   .5100119   .1962033                      .2399497    1.084028
  /sigma_e |   2.018389   .0957214                      1.839233    2.214995
       rho |   .0600166   .0445522                      .0110832    .2056057
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)=    3.56 Prob>=chibar2 = 0.030
  
. xtmixed growth lagg1 opengdp openimp openex leftc || _all: R.country, mle
  Wald chi2(5)       =     97.44
  Log likelihood =  -514.4714                     Prob > chi2        =    0.0000
  
------------------------------------------------------------------------------
  growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |   .1873501   .0859494     2.18   0.029     .0188925    .3558078
  opengdp |   .0077706   .0009911     7.84   0.000     .0058281     .009713
  openimp |  -.0055243   .0010452    -5.29   0.000    -.0075729   -.0034757
   openex |   .0020447   .0005915     3.46   0.001     .0008854    .0032039
    leftc |   .0044378   .0038479     1.15   0.249     -.003104    .0119796
    _cons |   2.583148   .5173579     4.99   0.000     1.569145    3.597151
------------------------------------------------------------------------------
    
------------------------------------------------------------------------------
    Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
  -----------------------------+------------------------------------------------
    _all: Identity               |
    sd(R.country) |   .5100191   .1962046      .2399545    1.084037
  -----------------------------+------------------------------------------------
    sd(Residual) |   2.018388   .0957229       1.83923    2.214997
  ------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =     3.56 Prob >= chibar2 = 0.0296
```
  
---
    
### Crossed Random Effects
    
```
Mixed-effects ML regression                     Number of obs      =       240
Group variable: _all                            Number of groups   =         1
  
Obs per group: min =       240
               avg =     240.0
               max =       240
  
  
Wald chi2(5)       =      7.18
Log likelihood = -503.45468                     Prob > chi2        =    0.2076
  
------------------------------------------------------------------------------
  growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |   .0059048   .1296512     0.05   0.964    -.2482069    .2600164
  opengdp |   .0001904   .0016087     0.12   0.906    -.0029626    .0033433
  openimp |  -.0030722   .0015617    -1.97   0.049     -.006133   -.0000114
   openex |    .002307   .0010185     2.27   0.024     .0003108    .0043032
    leftc |   .0048234   .0036133     1.33   0.182    -.0022585    .0119053
    _cons |   3.147245   .7630121     4.12   0.000     1.651768    4.642721
------------------------------------------------------------------------------
    
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
    _all: Identity               |
    sd(R.country) |   .6667379   .1900389      .3813634    1.165658
  -----------------------------+------------------------------------------------
    _all: Identity               |
    sd(R.year) |   1.554459   .4033566      .9347738     2.58495
  -----------------------------+------------------------------------------------
    sd(Residual) |   1.752177   .0885389      1.586961    1.934595
  ------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(2) =    25.59   Prob > chi2 = 0.0000
  
Note: LR test is conservative and provided only for reference
  
  . estimates store MLEtwowayRE
```

---
  
```
. lrtest MLEtwowayRE MLEunitRE
  
  Likelihood-ratio test                             LR chibar2(01)   =     22.03
  (Assumption: MLEunitRE nested in MLEtwowayRE)     Prob > chibar2   =    0.0000
  
  . qui xtmixed growth lagg1 opengdp openimp openex leftc || _all: R.year, mle
  
  . lrtest MLEtwowayRE . 
  
  Likelihood-ratio test                             LR chibar2(01)   =     10.04
  (Assumption: . nested in MLEtwowayRE)             Prob > chibar2   =    0.0008
  ```
  
---
    
```
. xtmixed growth lagg1 opengdp openimp openex leftc || country: leftc, covariance(unstructured)
  
Performing EM optimization: 
    
Performing gradient-based optimization: 
    
  Iteration 0:   log restricted-likelihood = -540.17955  
  Iteration 1:   log restricted-likelihood = -540.15493  
  Iteration 2:   log restricted-likelihood = -540.15472  
  Iteration 3:   log restricted-likelihood = -540.15472  
  
  Computing standard errors:
    
    Mixed-effects REML regression                   Number of obs      =       240
  Group variable: country                         Number of groups   =        16
  
  Obs per group: min =        15
  avg =      15.0
  max =        15
  
  
  Wald chi2(5)       =     95.70
  Log restricted-likelihood = -540.15472          Prob > chi2        =    0.0000
  
------------------------------------------------------------------------------
    growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lagg1 |    .170562   .0869219     1.96   0.050     .0001982    .3409259
  opengdp |   .0078608   .0010053     7.82   0.000     .0058905    .0098312
  openimp |  -.0055371   .0010763    -5.14   0.000    -.0076465   -.0034277
   openex |   .0020745   .0005967     3.48   0.001     .0009051    .0032439
    leftc |   .0039332   .0046265     0.85   0.395    -.0051346     .013001
    _cons |   2.570449   .5444497     4.72   0.000     1.503347    3.637551
------------------------------------------------------------------------------
    
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
    country: Unstructured        |
    sd(leftc) |   .0089451   .0078813      .0015908    .0502989
  sd(_cons) |   .6566839   .2658791      .2969756    1.452085
  corr(leftc,_cons) |  -.6168731   .5300418     -.9835763    .7429732
-----------------------------+------------------------------------------------
  sd(Residual) |   2.022226    .098202       1.83863    2.224156
  ------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(3) =     5.40   Prob > chi2 = 0.1445
  
Note: LR test is conservative and provided only for reference
  . * The coefficient is insignificant as is the randomness
```

---
  
```
. estat recovariance
  
Random-effects covariance matrix for level country
  
  |     leftc      _cons 
  -------------+----------------------
    leftc |    .00008            
  _cons | -.0036236   .4312338 
  
  . capture drop u1 u2
  
  . predict u*, reffects
```
  
---
    
```
  . by country, sort: sum u*
---------------------------------------------------------------------------------------------------------------
  -> country = AUL
Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
    u1 |        15   -.0006591           0  -.0006591  -.0006591
    u2 |        15    .1237475           0   .1237475   .1237475
  -> country = AUS
  u1 |        15    .0005591           0   .0005591   .0005591
  u2 |        15    .0125652           0   .0125652   .0125652
  -> country = BEL
  u1 |        15   -.0000316           0  -.0000316  -.0000316
  u2 |        15   -.0002924           0  -.0002924  -.0002924
  -> country = CAN
  u1 |        15   -.0035756           0  -.0035756  -.0035756
  u2 |        15    .4255248           0   .4255248   .4255248
  -> country = DEN
  u1 |        15    .0019625           0   .0019625   .0019625
  u2 |        15    -.462575           0   -.462575   -.462575
  -> country = FIN
  u1 |        15     .003543           0    .003543    .003543
  u2 |        15    .1606634           0   .1606634   .1606634
  -> country = FRA
  u1 |        15   -.0083416           0  -.0083416  -.0083416
  u2 |        15    .3128709           0   .3128709   .3128709
  -> country = GER
  u1 |        15    .0011514           0   .0011514   .0011514
  u2 |        15   -.3119804           0  -.3119804  -.3119804
  -> country = IRE
  u1 |        15   -.0021854           0  -.0021854  -.0021854
  u2 |        15    .3908045           0   .3908045   .3908045
  -> country = ITA
  u1 |        15    .0002358           0   .0002358   .0002358
  u2 |        15   -.1705837           0  -.1705837  -.1705837
  -> country = JAP
  u1 |        15   -.0090248           0  -.0090248  -.0090248
  u2 |        15    1.074025           0   1.074025   1.074025
  -> country = NET
  u1 |        15    .0031352           0   .0031352   .0031352
  u2 |        15   -.2520462           0  -.2520462  -.2520462
  -> country = NOR
  u1 |        15    .0088704           0   .0088704   .0088704
  u2 |        15    .0223926           0   .0223926   .0223926
  -> country = SWE
  u1 |        15     .002398           0    .002398    .002398
  u2 |        15   -.5351107           0  -.5351107  -.5351107
  -> country = UK
  u1 |        15     .000085           0    .000085    .000085
  u2 |        15   -.5665398           0  -.5665398  -.5665398
  -> country = USA
  u1 |        15    .0018777           0   .0018777   .0018777
  u2 |        15   -.2234658           0  -.2234658  -.2234658
```
  
---
    
    
![A Plot](img/reffect-plot-blup.png)
  
---
    
## Wilson and Butler
    
- Survey of papers using TSCS data and methods(?)
- Vast majority do nothing about space or time.
- Does it matter?
    
- Table 3
- Table 4
  
- What do we do?  Raise the bar for positive findings and look at multiple models trying to tease out the role of particular assumptions as necessary and/or sufficient for results.
  
  
---
    
## More on xtpcse
    
---
    
## Holding on to data
    
- `$\texttt{preserve}$`
- `$\texttt{restore}$`
  
  
---
    
## Testing the Null Hypothesis of No Random Effects
    
```
  . xttest0
  
  Breusch and Pagan Lagrangian multiplier test for random effects:
    
    growth[country,t] = Xb + u[country] + e[country,t]
  
  Estimated results:
    |       Var     sd = sqrt(Var)
  ---------+-----------------------------
    growth |   6.041246       2.457895
  e |   4.147091       2.036441
  u |   .0473477       .2175953
  
  Test:   Var(u) = 0
  chi2(1) =     4.39
  Prob > chi2 =     0.0361
  
```
  
---
    
## xttest
    
```
. xttest1
  
Tests for the error component model:
    
    growth[country,t] = Xb + u[country] + v[country,t]
  v[country,t] = rho v[country,(t-1)] + e[country,t]
  
  Estimated results:
    Var     sd = sqrt(Var)
  ---------+-----------------------------
    growth |   6.041246       2.457895
  e |   4.037869      2.0094449
  u |     .13335      .36517121
  
  Tests:
    Random Effects, Two Sided:
    LM(Var(u)=0)       =    1.00   Pr>chi2(1) =  0.3174
  ALM(Var(u)=0)      =    0.54   Pr>chi2(1) =  0.4610
  
  Random Effects, One Sided:
    LM(Var(u)=0)       =    1.00   Pr>N(0,1)  =  0.1587
  ALM(Var(u)=0)      =    0.74   Pr>N(0,1)  =  0.2305
  
  Serial Correlation:
    LM(rho=0)          =    0.74   Pr>chi2(1) =  0.3906
  ALM(rho=0)         =    0.28   Pr>chi2(1) =  0.5961
  
  Joint Test:
    LM(Var(u)=0,rho=0) =    1.28   Pr>chi2(2) =  0.5271
  
  * We cannot reject the null hypothesis of no variation in the random effects.  
  Also no evidence of serial correlation.  
  Remember, with the lagged endogenous variable on the right hand side, 
  the random effects are included if they are there.
```
  
---
    
## `$\texttt{xttest1}$`
    
    
- LM test for random effects, assuming no serial correlation
- Adjusted LM test for random effects, which works even under serial
  correlation
- One-sided version of the LM test for random effects
- One-sided version of the adjusted LM test for random effects
- LM joint test for random effects and serial correlation
- LM test for first-order serial correlation, assuming no random
  effects
- Adjusted test for first-order serial correlation, which works even
  under random effects
  
---
    
## `$\texttt{xtgls}$`
    
- `$\texttt{corr}$`: `$t$` structure ([ar] or [ps]ar) is `$\rho$` common or not.
- `$\texttt{panels}$`: `$i$` structure (iid, [h]eteroscedastic, [c]orrelated (and [h]))
- `$\texttt{rhotype}$`: regress (regression using lags), dw - Durbin-Watson, freg (forward regression uses leads), nagar, theil, tscorr
- `$\texttt{igls}$` (iterate or two-step)
- `$\texttt{force}$` for unbalanced.
  
---
    
## `$\texttt{xttest2}$` and `$\texttt{xttest3}$`
    
After `$\texttt{fe}$` or `$\texttt{xtgls}$`, we have two tests pre-programmed. 
  
- We have a test of independence (within) in `$\texttt{xttest2}$`
- We have a test of homoscedasticity (within) in `$\texttt{xttest3}$`
    
    
---
    
## `$\texttt{xtserial}$`
    
    Wooldridge presents a test for serial correlation. 
  
---
    
## `$\texttt{xtcsd}$`
    
How do we test for cross-sectional dependence?
    
- Generally used for small `$T$` and large `$N$` settings.
- Three methods: \texttt{xtcsd, pesaran friedman frees}
- This is the panel correction in PCSE
  
  
---
    
## `$\texttt{xtscc}$`
    
Driscoll and Kraay (1998) describe a robust covariance matrix estimator for pooled and fixed effects regression models that contain a large time dimension.  The approach is robust to heteroscedasticity, autocorrelation, and spatial correlation.
  
---
    
## We're Here for Fancy Estimators, Why is Everything OLS?
    
There are limitation imposed by what people have programmed in terms of regression diagnostics.  However, if we can fit the same model by OLS, we can use standard regression diagnostics post-estimation to avoid calculating the diagnostics by hand.  Many diagnostics are pre-programmed.
  
---
    
## OLS Diagnostics
    
- We could also use other standard diagnostics in the OLS framework.  If you are going to intensively use Stata, books like Statistics with Stata are quite useful.
- `$\texttt{estat ovtest, [rhs]}$` will give us Ramsey's RESET test. The option gives us RHS variables, otherwise we just use fitted values.  The default is a Wald test applied to the regression `$$y_{it} = X_{it}\beta + \hat{y}^{2}\gamma_{1} + \hat{y}^{3}\gamma_{2} + \hat{y}^{4}\gamma_{3} + \epsilon_{it}$$` and with option `$\texttt{rhs}$` the powers are applied to the right-hand side variables.
- `$\texttt{predict ... , dfits}$` and `$\texttt{dfbeta}$`: We also have the various `$\texttt{dffits}$` and `$\texttt{dfbeta}$` statistics for use in diagnosing leverage.  The dfit is the studentized residual multiplied by the square root of `$h_{j}$` over `$(1 - h_{j})$`; basically a scaled measure of the difference between in-sample and out-of-sample predictions. The `$\texttt{dfit}$` is obtained as a post-regression prediction using predict. Define `$\texttt{dfbeta}$` as: 
`$$DFBETA_{j} = \frac{r_{j}v_{j}}{\sqrt{v^{2}(1-h_{j})}}$$` where `$h$` is the `$j^{th}$` item in `$\mathbf{P}$`, `$r_{j}$` is the studentized residual, `$v_{j}$` are the residuals from a regression not containing the regressor in question, and `$v^{2}$` is their sum of squares.  Suggested cutoffs are `$2\sqrt{\frac{k}{N}}$` for dfit and `$\frac{2}{\sqrt{N}}$` for dfbeta.  There is also the Cook's distance (\texttt{cooksd}) and Welsch distance ($\texttt{welsch}$).
- `$\texttt{estat hettest [varlist] [, rhs [normal | iid | fstat] mtest[(spec)]]}$` gives us a variety of tests for heteroscedasticity.  The `$\texttt{rhs}$` option gives structure from covariates.  `$\texttt{mtest}$` is important because we are doing multiple testing (often).

---

## continued

- `$\texttt{estat vif}$` gives us some collinearity diagnostics.  The statistic is essentially `$\frac{1}{1-R^{2}_{(-k)}}$`.
- `$\texttt{estat imtest [, preserve white]}$` where the default is Cameron-Trivedi, we can request White's version, and preserve maintains the original data (saves time often).  As a general misspecification test, the Information Matrix test is shown by Hall (1987) to decompose into heteroscedasticity, skewness, and kurtosis of residuals and has some suboptimal properties.

---

## Plots

-     avplot:             added-variable plot
-    avplots:            all added-variable plots in one image
-    cprplot:            component-plus-residual plot
-    lvr2plot:           leverage-versus-squared-residual plot
-    rvfplot:            residual-versus-fitted plot
-    rvpplot:            residual-versus-predictor plot

---

## Panel Unit Root Testing in Stata

- Levin-Lin-Chu ( `$\texttt{xtunitroot llc}$` ): trend nocons (unit specific) demean (within transform) lags.  Under (crucial) cross-sectional independence, the test is an advancement on the generic Dickey-Fuller theory that allows the lag lengths to vary by cross-sections.  The test relies on specifying a kernel (beyond our purposes) and a lag length (upper bound).  The test statistic has a standard normal basis with asymptotics in `$\frac{\sqrt{N_{T}}}{T}$` ( `$T$` grows faster than `$N$` ).  The test is of either all series containing unit roots ( `$H_{0}$` ) or all stationary; this is a limitation.  It is recommended for moderate to large `$T$` and `$N$`.

- Perform separate ADF regressions: `$$\Delta y_{it} = \rho_{i} \Delta y_{i,t-1} + \sum_{L=1}^{p_i} \theta_{iL} \Delta y_{i,t=L} + \alpha_{mi}d_{mt} + \epsilon_{it}$$` with `$d_{mt}$` as the vector of deterministic variables (none, drift, drift and trend).  Select a max `$L$` and use `$t$` on `$\hat{\theta}_{iL}$` to attempt to simplify.  Then use `$\Delta y_{it} = \Delta y_{i,t-L}$` and `$d_{mt}$` for residuals

---

- Harris-Tzavalis ( `$\texttt{xtunitroot ht}$` ):  trend nocons (unit specific) demean (within transform) altt (small sample adjust)
Similar to the previous, they show that `$T \rightarrow \infty$` faster than `$N$` (rather than `$T$` fixed) leads to size distortions.

- Breitung ( `$\texttt{xtunitroot breitung}$` ):  trend nocons (unit specific) demean (within transform) robust (CSD) lags. Similar to LLC with a common statistic across all `$i$`.

- Im, Pesaran, Shin ( `$\texttt{xtunitroot ips}$` ):  trend demean (within transform) lags.  They free `$\rho$` to be `$\rho_{i}$` and average individual unit root statistics.  The null is that all contain unit roots while the alternative specifies at least some to be stationary.  The test relies on sequential asymptotics (first T, then N).  Better in small samples than LLC, but note the differences in the alternatives.

- Fisher type tests ( `$\texttt{xtunitroot fisher}$` ): dfuller pperron demean lags.

- Hadri (LM) ( `$\texttt{xtunitroot hadri}$` ): trend demean robust

All but the last are null hypothesis unit-root tests.  Most assume balance but the fisher and IPS versions can work for unbalanced panels.

---

## ADL/Canonical models

We can consider some very basic time series models.

- Koyck/Geometric decay: short run and long-run effects are parametrically identified (given `$\mathcal{M}$`).
- Almon (more arbitrary decay): `$$y_{it} = \sum_{t_{A}=0}^{T_{F}} \rho_{t_{A}}x_{t - t_{A}} + \epsilon_{t}$$` with coefficients that are ordinates of some general polynomial of degree `$T_{F} >> q$`.  The `$\rho_{t_{A}} = \sum_{k=0}^{T_{F}} \gamma_{k}t^{k}$`.
- Prais-Winston, etc.  are basically FGLS implementations of AR(1).

---

## Prais-Winsten/Cochrane-Orcutt

`$$y_{it} = X_{it}\beta + \epsilon_{it}$$` where `$$\epsilon_{it} = \rho \epsilon_{i,t-1} + \nu_{it}$$` and `$\nu_{it} \sim N(0,\sigma^{2}_{\nu})$` with stationarity forcing `$|\rho| < 1$`.  We will use iterated FGLS.

1. First, estimate the regression recalling our unbiasedness condition.  
1. Then regress `$\hat{\epsilon}_{it}$` on `$\hat{\epsilon}_{i,t-1}$`.  
1. Rinse and repeat until `$\rho$` doesn't change.  The transformation applied to the first observation is distinct, you can look this up....  
  
In general, the transformed regression is: `$$y_{it} - \rho y_{i,t-1} = \alpha ( 1 - \rho ) + \beta (X_{it} - \rho X_{i,t-1}) + \nu_{it}$$` with `$\nu$` white noise.
  
---
    
## Beck
    
- Static model: Instantaneous impact. `$$y_{i,t} = X_{i,t}\beta + \nu_{i,t}$$`
    
- Finite distributed lag: lags of `$x$` finite horizon impact (defined by lags). `$$y_{i,t} = X_{i,t}\beta + \sum_{k=1}^{K} X_{i,t-k}\beta_{k} + \nu_{i,t}$$` 
    
- AR(1): Errors decay geometrically, `$X$` instantaneous.  (Suppose unmeasured `$x$` and think this through).
  `$$y_{i,t} = X_{i,t}\beta + \nu_{i,t} + \theta\epsilon_{i,t-1}$$` 
- Lagged dependent variable: lags of `$y$` [common geometric decay]
  `$$y_{i,t} = X_{i,t}\beta + \phi y_{i,t-1} + \nu_{i,t}$$`
    
- ADL: current and lagged `$x$` and lagged `$y$`.
  `$$y_{i,t} = X_{i,t}\beta + X_{i,t-1}\gamma + \phi y_{i,t-1} + \epsilon_{i,t}$$` 
- Panel versions of transfer function models from Box and Jenkins time series. (each `$x$` has an impact and decay function)

---
    
## Brief Comment on Hurwicz/Nickell Bias
    
- Bias is of stochastic order `$\frac{1}{T}$`.
- Less bad as more `$T$`
    
    
---
    
## Interpretation of dynamic models
    
- Do it.
- Whitten and Williams `$\texttt{dynsim}$` uses `$\texttt{Clarify}$`  **NB: If you do not know what Clarify is, please ask**: estimate, set, simulate to do this.
- Their paper is *But Wait, There's More! Maximizing Substantive Inferences from TSCS Models*.  Easy to find on the web and on the website.

---

## Details

`$$y_{it}  =  \alpha + \gamma y_{i, t-1} + X_{it}\beta + \epsilon_{it}$$`
`$$y_{it}  =  \alpha + \gamma [\alpha + \gamma y_{i, t-2} + X_{i,t-1}\beta + \epsilon_{i,t-1}]  + X_{it}\beta + \epsilon_{it}$$`
`$$y_{it}  =  \alpha + \gamma [\alpha + \gamma (\alpha + \gamma y_{i, t-3} + X_{i,t-2}\beta + \epsilon_{it}) + X_{i,t-1}\beta + \epsilon_{i,t-1}]  + X_{it}\beta + \epsilon_{it}$$`

We can continue substituting through to conclude that we have a geometrically decaying impact so that the long-run effect of a one-unit change in `$X$` is $$ \frac{\beta}{1-\gamma}$$

But `$\gamma$` has uncertainty, it is an estimate.  To show the realistic long-run impact, we need to incorporate that uncertainty.