bayespecon.SEMPanelRE¶
-
class bayespecon.SEMPanelRE(mundlak=
False, **kwargs)[source]¶ Bayesian spatial error panel model with unit random effects.
\[y_{it} = X_{it}\beta + \alpha_i + u_{it}, \quad u_{it} = \lambda (Wu)_{it} + \varepsilon_{it}\]Equivalently the spatially-filtered residual is i.i.d.:
\[\varepsilon_{it} = (I - \lambda W)(y - X\beta - \alpha)_{it} \sim N(0, \sigma^2)\]- Parameters:¶
- formula : str, optional
Wilkinson-style formula, e.g.
"y ~ x1 + x2". Requiresdata,unit_col, andtime_col.- data : pandas.DataFrame, optional
Long-format panel data when using formula mode.
- y : array-like, optional
Stacked response of shape
(N*T,). Required in matrix mode.- X : array-like or pandas.DataFrame, optional
Stacked design matrix. Required in matrix mode.
- W : libpysal.graph.Graph or scipy.sparse matrix
Spatial weights of shape
(N, N). Should be row-standardised.- unit_col : str, optional
Column in
dataidentifying the cross-sectional unit. Required in formula mode.- time_col : str, optional
Column in
dataidentifying the time period. Required in formula mode.- N : int, optional
Number of cross-sectional units. Required in matrix mode.
- T : int, optional
Number of time periods. Required in matrix mode.
- priors : dict, optional
Override default priors. Supported keys:
lam_lower(float, default -1.0): Lower bound of Uniform prior on \(\lambda\).lam_upper(float, default 1.0): Upper bound of Uniform prior on \(\lambda\).beta_mu(float, default 0.0): Normal prior mean for \(\beta\).beta_sigma(float, default 1e6): Normal prior std for \(\beta\).sigma_sigma(float, default 10.0): HalfNormal prior std for \(\sigma\).sigma_alpha_sigma(float, default 10.0): HalfNormal prior std for \(\sigma_\alpha\).nu_lam(float, default 1/30): Rate of TruncExp(lower=2) prior on \(\nu\) (only used whenrobust=True).
- logdet_method : str, optional
How to compute \(\log|I - \lambda W|\); auto-selected when
None(default).- robust : bool, default False
If True, replace the Normal innovation with Student-t. See Robust regression below.
- mundlak : bool, default False¶
If True, use the Mundlak (1978) correlated random effects specification. This augments the design matrix with unit-level time-averages of the regressors, modelling the correlation between \(\alpha_i\) and \(X\) explicitly. See Mundlak / Correlated Random Effects below for details.
Notes
The base-class
modelargument is not exposed; pooled mean structure (model=0) is used because unit heterogeneity is captured by the random effect rather than by within-unit demeaning.Identification of λ in SEM-RE models
The spatial error parameter \(\lambda\) is weakly identified when random effects \(\alpha_i\) are present. The random effects absorb spatial correlation across units, making it difficult for the data to distinguish between \(\lambda\) (spatial error dependence) and \(\sigma_\alpha^2\) (between-unit variance). Both Gibbs and NUTS samplers will tend to estimate \(\lambda\) near zero even when the true value is moderate, because the posterior genuinely concentrates there. This is a model identification issue, not a sampler bug.
Possible remedies include:
Use fixed effects (
SEMPanelFE) instead of random effectsUse a Spatial Durbin model (
SDMPanelRE) that includes WX termsUse longer panels (\(T \to \infty\)) which provide more information to separate \(\lambda\) from \(\alpha\)
Use the Mundlak specification (
mundlak=True) to test for RE-regressor correlation, though note this does not resolve the \(\lambda\) identification issue itself
Mundlak / Correlated Random Effects
The Mundlak (1978) approach models the correlation between \(\alpha_i\) and the regressors by decomposing the random effect:
\[\alpha_i = \bar{X}_i \gamma + \eta_i, \quad \eta_i \sim N(0, \sigma_\eta^2)\]where \(\bar{X}_i = T^{-1} \sum_t X_{it}\) are the unit-level means of the time-varying regressors. Substituting into the model yields an augmented regression with \([X, \bar{X}]\) as regressors, where \(\gamma\) is estimated alongside \(\beta\) and the residual random effect \(\eta_i\) captures only orthogonal unit heterogeneity.
Important: The Mundlak specification addresses RE-regressor correlation but does not resolve the \(\lambda\) identification issue described above. Even with Mundlak augmentation, \(\lambda\) remains weakly identified because \(\eta_i\) can still absorb spatial correlation. The Mundlak approach is primarily useful for:
Testing whether \(\alpha_i\) is correlated with regressors (LR test of \(\gamma = 0\))
Obtaining consistent \(\beta\) estimates when RE are correlated with \(X\)
Reducing \(\sigma_\alpha^2\) by absorbing the explained between-unit variation into \(\gamma\)
Following Baltagi (2023), the Mundlak approach does not yield the same estimates as fixed effects for spatial models (unlike the non-spatial case), but MLE/Gibbs estimation remains valid.
Robust regression
When
robust=True, the spatially-filtered error distribution is changed from Normal to Student-t, yielding a model that is robust to heavy-tailed outliers:\[\varepsilon_{it} = (I - \lambda W)(y - X\beta - \alpha_i) \sim t_\nu(0, \sigma^2)\]where \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate
nu_lam(default 1/30). The defaultnu_lam = 1/30gives a prior mean of approximately 30, favouring near-Normal tails. The lower bound of 2 ensures the variance exists.Methods
__init__([mundlak])fit([draws, tune, chains, target_accept, ...])Sample posterior for SEM panel RE model.
Return fitted values at posterior mean parameters.
Return transformed residuals
y - fitted.Run Bayesian LM specification tests and return a summary table.
spatial_diagnostics_decision([alpha, format])Return a model-selection decision from Bayesian LM test results.
spatial_effects([return_posterior_samples])Compute Bayesian inference for direct, indirect, and total impacts.
summary([var_names])Return posterior summary table.
Attributes
Return the ArviZ InferenceData from the most recent fit.
Whether the Mundlak correlated RE specification is active.
Names of the Mundlak augmentation columns, or None if inactive.
Return the PyMC model object built for the most recent fit.
-
fit(draws=
2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, sampler='gibbs', thin=1, n_jobs=-1, progressbar=True, **sample_kwargs)[source]¶ Sample posterior for SEM panel RE model.
- Parameters:¶
- draws : int, default 2000¶
Number of post-warmup draws per chain.
- tune : int, default 1000¶
Number of warmup draws per chain (NUTS) or burn-in draws (Gibbs).
- chains : int, default 4¶
Number of independent chains.
- target_accept : float, default 0.9¶
NUTS target acceptance probability. Ignored for Gibbs.
- random_seed : int or None¶
Seed for reproducibility.
- idata_kwargs : dict or None¶
Extra kwargs for InferenceData (NUTS only).
- sampler : str, default "gibbs"¶
Sampler to use:
"gibbs"for 5-block Gibbs or"nuts"for PyMC NUTS.- thin : int, default 1¶
Keep every
thin-th draw after warmup (Gibbs only).- n_jobs : int, default -1¶
Number of parallel workers (Gibbs only).
- progressbar : bool, default True¶
Show per-chain progress bars (Gibbs only).
- **sample_kwargs¶
Extra keyword arguments forwarded to PyMC (NUTS only).
- Return type:¶
az.InferenceData
- property inference_data : arviz.data.inference_data.InferenceData | None[source]¶
Return the ArviZ InferenceData from the most recent fit.
- property mundlak_names : list[str] | None[source]¶
Names of the Mundlak augmentation columns, or None if inactive.
- property pymc_model : pymc.model.core.Model | None[source]¶
Return the PyMC model object built for the most recent fit.
- spatial_diagnostics()[source]¶
Run Bayesian LM specification tests and return a summary table.
Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLSPanelFE model runs Panel-LM-Lag, Panel-LM-Error, Panel-LM-SDM-Joint, and Panel-LM-SLX-Error-Joint.
Requires the model to have been fit (
.fit()called) and a spatial weights matrixWto have been supplied at construction time.- Returns:¶
DataFrame indexed by test name with columns:
Column
Description
statistic
Posterior mean of the LM statistic
median
Posterior median of the LM statistic
df
Degrees of freedom for the \(\chi^2\) reference
p_value
Bayesian p-value:
1 - chi2.cdf(mean, df)ci_lower
Lower bound of 95% credible interval (2.5%)
ci_upper
Upper bound of 95% credible interval (97.5%)
The DataFrame has
attrs["model_type"](class name) andattrs["n_draws"](total posterior draws) metadata.- Return type:¶
pandas.DataFrame
- Raises:¶
RuntimeError – If the model has not been fit yet.
See also
spatial_diagnostics_decisionModel-selection decision based on the test results.
-
spatial_diagnostics_decision(alpha=
0.05, format='graphviz')[source]¶ Return a model-selection decision from Bayesian LM test results.
Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical
stge_kbprocedure in Anselin et al. [1996]), adapted for panel models following Elhorst [2014].- Parameters:¶
- alpha : float, default 0.05¶
Significance level for the Bayesian p-values.
- format : {"graphviz", "ascii", "model"}, default "graphviz"¶
Output format.
"model"returns the recommended-model name string."ascii"returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted."graphviz"returns agraphviz.Digraphobject that renders inline in Jupyter; if the optionalgraphvizpackage is not installed aUserWarningis issued and the ASCII rendering is returned instead.
- Returns:¶
Recommended model name when
format="model", an ASCII tree string whenformat="ascii", or agraphviz.Digraphwhenformat="graphviz"(with ASCII fallback on missing dep).- Return type:¶
str or graphviz.Digraph
See also
spatial_diagnosticsCompute the Bayesian LM test statistics.
References
Koley and Bera [2024], Anselin et al. [1996], Elhorst [2014]
-
spatial_effects(return_posterior_samples=
False)[source]¶ Compute Bayesian inference for direct, indirect, and total impacts.
Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values.
- Parameters:¶
- Returns:¶
If return_posterior_samples is
False(default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.If return_posterior_samples is
True, returns(DataFrame, dict)where the dict has keys"direct","indirect","total", each mapping to a(G, k)array of posterior draws.- Return type:¶