bayespecon.SEMPanelRE¶

class bayespecon.SEMPanelRE(mundlak=False, **kwargs)[source]¶

Bayesian spatial error panel model with unit random effects.

\[y_{it} = X_{it}\beta + \alpha_i + u_{it}, \quad u_{it} = \lambda (Wu)_{it} + \varepsilon_{it}\]

Equivalently the spatially-filtered residual is i.i.d.:

\[\varepsilon_{it} = (I - \lambda W)(y - X\beta - \alpha)_{it} \sim N(0, \sigma^2)\]

Parameters:¶

formula : str, optional

Wilkinson-style formula, e.g. "y ~ x1 + x2". Requires data, unit_col, and time_col.

data : pandas.DataFrame, optional

Long-format panel data when using formula mode.

y : array-like, optional

Stacked response of shape (N*T,). Required in matrix mode.

X : array-like or pandas.DataFrame, optional

Stacked design matrix. Required in matrix mode.

W : libpysal.graph.Graph or scipy.sparse matrix

Spatial weights of shape (N, N). Should be row-standardised.

unit_col : str, optional

Column in data identifying the cross-sectional unit. Required in formula mode.

time_col : str, optional

Column in data identifying the time period. Required in formula mode.

N : int, optional

Number of cross-sectional units. Required in matrix mode.

T : int, optional

Number of time periods. Required in matrix mode.

priors : dict, optional

Override default priors. Supported keys:

lam_lower (float, default -1.0): Lower bound of Uniform prior on \(\lambda\).
lam_upper (float, default 1.0): Upper bound of Uniform prior on \(\lambda\).
beta_mu (float, default 0.0): Normal prior mean for \(\beta\).
beta_sigma (float, default 1e6): Normal prior std for \(\beta\).
sigma_sigma (float, default 10.0): HalfNormal prior std for \(\sigma\).
sigma_alpha_sigma (float, default 10.0): HalfNormal prior std for \(\sigma_\alpha\).
nu_lam (float, default 1/30): Rate of TruncExp(lower=2) prior on \(\nu\) (only used when robust=True).

logdet_method : str, optional

How to compute \(\log|I - \lambda W|\); auto-selected when None (default).

robust : bool, default False

If True, replace the Normal innovation with Student-t. See Robust regression below.

mundlak : bool, default False¶

If True, use the Mundlak (1978) correlated random effects specification. This augments the design matrix with unit-level time-averages of the regressors, modelling the correlation between \(\alpha_i\) and \(X\) explicitly. See Mundlak / Correlated Random Effects below for details.

Notes

The base-class model argument is not exposed; pooled mean structure (model=0) is used because unit heterogeneity is captured by the random effect rather than by within-unit demeaning.

Identification of λ in SEM-RE models

The spatial error parameter \(\lambda\) is weakly identified when random effects \(\alpha_i\) are present. The random effects absorb spatial correlation across units, making it difficult for the data to distinguish between \(\lambda\) (spatial error dependence) and \(\sigma_\alpha^2\) (between-unit variance). Both Gibbs and NUTS samplers will tend to estimate \(\lambda\) near zero even when the true value is moderate, because the posterior genuinely concentrates there. This is a model identification issue, not a sampler bug.

Possible remedies include:

Use fixed effects (SEMPanelFE) instead of random effects
Use a Spatial Durbin model (SDMPanelRE) that includes WX terms
Use longer panels (\(T \to \infty\)) which provide more information to separate \(\lambda\) from \(\alpha\)
Use the Mundlak specification (mundlak=True) to test for RE-regressor correlation, though note this does not resolve the \(\lambda\) identification issue itself

Mundlak / Correlated Random Effects

The Mundlak (1978) approach models the correlation between \(\alpha_i\) and the regressors by decomposing the random effect:

\[\alpha_i = \bar{X}_i \gamma + \eta_i, \quad \eta_i \sim N(0, \sigma_\eta^2)\]

where \(\bar{X}_i = T^{-1} \sum_t X_{it}\) are the unit-level means of the time-varying regressors. Substituting into the model yields an augmented regression with \([X, \bar{X}]\) as regressors, where \(\gamma\) is estimated alongside \(\beta\) and the residual random effect \(\eta_i\) captures only orthogonal unit heterogeneity.

Important: The Mundlak specification addresses RE-regressor correlation but does not resolve the \(\lambda\) identification issue described above. Even with Mundlak augmentation, \(\lambda\) remains weakly identified because \(\eta_i\) can still absorb spatial correlation. The Mundlak approach is primarily useful for:

Testing whether \(\alpha_i\) is correlated with regressors (LR test of \(\gamma = 0\))
Obtaining consistent \(\beta\) estimates when RE are correlated with \(X\)
Reducing \(\sigma_\alpha^2\) by absorbing the explained between-unit variation into \(\gamma\)

Following Baltagi (2023), the Mundlak approach does not yield the same estimates as fixed effects for spatial models (unlike the non-spatial case), but MLE/Gibbs estimation remains valid.

Robust regression

When robust=True, the spatially-filtered error distribution is changed from Normal to Student-t, yielding a model that is robust to heavy-tailed outliers:

\[\varepsilon_{it} = (I - \lambda W)(y - X\beta - \alpha_i) \sim t_\nu(0, \sigma^2)\]

where \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate nu_lam (default 1/30). The default nu_lam = 1/30 gives a prior mean of approximately 30, favouring near-Normal tails. The lower bound of 2 ensures the variance exists.

__init__(mundlak=False, **kwargs)[source]¶

Methods

`__init__`([mundlak])
`fit`([draws, tune, chains, target_accept, ...])	Sample posterior for SEM panel RE model.
`fitted_values`()	Return fitted values at posterior mean parameters.
`residuals`()	Return transformed residuals `y - fitted`.
`spatial_diagnostics`()	Run Bayesian LM specification tests and return a summary table.
`spatial_diagnostics_decision`([alpha, format])	Return a model-selection decision from Bayesian LM test results.
`spatial_effects`([return_posterior_samples])	Compute Bayesian inference for direct, indirect, and total impacts.
`summary`([var_names])	Return posterior summary table.

Attributes

`inference_data`	Return the ArviZ InferenceData from the most recent fit.
`mundlak`	Whether the Mundlak correlated RE specification is active.
`mundlak_names`	Names of the Mundlak augmentation columns, or None if inactive.
`pymc_model`	Return the PyMC model object built for the most recent fit.

fit(draws=2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, sampler='gibbs', thin=1, n_jobs=-1, progressbar=True, **sample_kwargs)[source]¶

Sample posterior for SEM panel RE model.

Parameters:¶

draws : int, default 2000¶: Number of post-warmup draws per chain.
tune : int, default 1000¶: Number of warmup draws per chain (NUTS) or burn-in draws (Gibbs).
chains : int, default 4¶: Number of independent chains.
target_accept : float, default 0.9¶: NUTS target acceptance probability. Ignored for Gibbs.
random_seed : int or None¶: Seed for reproducibility.
idata_kwargs : dict or None¶: Extra kwargs for InferenceData (NUTS only).
sampler : str, default "gibbs"¶: Sampler to use: "gibbs" for 5-block Gibbs or "nuts" for PyMC NUTS.
thin : int, default 1¶: Keep every thin-th draw after warmup (Gibbs only).
n_jobs : int, default -1¶: Number of parallel workers (Gibbs only).
progressbar : bool, default True¶: Show per-chain progress bars (Gibbs only).
**sample_kwargs¶: Extra keyword arguments forwarded to PyMC (NUTS only).

Return type:¶

az.InferenceData

fitted_values()[source]¶

Return fitted values at posterior mean parameters.

Returns:¶: Fitted values on transformed panel scale.
Return type:¶: np.ndarray

property inference_data : arviz.data.inference_data.InferenceData | None[source]¶

Return the ArviZ InferenceData from the most recent fit.

Returns:¶: The inference data object, or None if the model has not been fit yet.
Return type:¶: arviz.InferenceData or None

property mundlak : bool[source]¶: Whether the Mundlak correlated RE specification is active.

property mundlak_names : list[str] | None[source]¶: Names of the Mundlak augmentation columns, or None if inactive.

property pymc_model : pymc.model.core.Model | None[source]¶

Return the PyMC model object built for the most recent fit.

Returns:¶: The model object used by fit(), or None if the instance has not been fit yet.
Return type:¶: pymc.Model or None

residuals()[source]¶

Return transformed residuals y - fitted.

Returns:¶: Residual vector on transformed panel scale.
Return type:¶: np.ndarray

spatial_diagnostics()[source]¶

Run Bayesian LM specification tests and return a summary table.

Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLSPanelFE model runs Panel-LM-Lag, Panel-LM-Error, Panel-LM-SDM-Joint, and Panel-LM-SLX-Error-Joint.

Requires the model to have been fit (.fit() called) and a spatial weights matrix W to have been supplied at construction time.

Returns:¶

DataFrame indexed by test name with columns:

Column	Description
statistic	Posterior mean of the LM statistic
median	Posterior median of the LM statistic
df	Degrees of freedom for the \(\chi^2\) reference
p_value	Bayesian p-value: `1 - chi2.cdf(mean, df)`
ci_lower	Lower bound of 95% credible interval (2.5%)
ci_upper	Upper bound of 95% credible interval (97.5%)

The DataFrame has attrs["model_type"] (class name) and attrs["n_draws"] (total posterior draws) metadata.

Return type:¶

pandas.DataFrame

Raises:¶

RuntimeError – If the model has not been fit yet.