bayespecon.models.SDEM

class bayespecon.models.SDEM(formula=None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None)[source]

Bayesian Spatial Durbin Error Model.

Combines spatial lags of the regressors \(X\) with a spatial autoregressive disturbance:

\[y = X\beta + WX\theta + u, \quad u = \lambda Wu + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I).\]

The sampled coefficient vector stacks the local and lagged-regressor blocks as \([\beta, \theta]\). The likelihood includes the spatial Jacobian \(\log|I - \lambda W|\).

Parameters:
formula : str, optional

Wilkinson-style formula, e.g. "y ~ x1 + x2". Requires data. Intercept is included by default.

data : pandas.DataFrame or geopandas.GeoDataFrame, optional

Data source for formula mode.

y : array-like, optional

Dependent variable of shape (n,). Required in matrix mode.

X : array-like or pandas.DataFrame, optional

Design matrix. Required in matrix mode. DataFrame columns are preserved as feature names.

W : libpysal.graph.Graph or scipy.sparse matrix

Spatial weights of shape (n, n). Accepts a libpysal.graph.Graph or any scipy.sparse matrix. The legacy libpysal.weights.W object is not accepted; pass w.sparse or libpysal.graph.Graph.from_W(w). Should be row-standardised; a UserWarning is raised otherwise.

priors : dict, optional

Override default priors. Supported keys:

  • lam_lower (float, default -1.0): Lower bound of the Uniform prior on \(\lambda\).

  • lam_upper (float, default 1.0): Upper bound of the Uniform prior on \(\lambda\).

  • beta_mu (float, default 0.0): Normal prior mean for \([\beta, \theta]\).

  • beta_sigma (float, default 1e6): Normal prior std for \([\beta, \theta]\).

  • sigma_sigma (float, default 10.0): HalfNormal prior std for \(\sigma\).

  • nu_lam (float, default 1/30): Rate of TruncExp(lower=2) prior on \(\nu\) (only used when robust=True).

logdet_method : str, optional

How to compute \(\log|I - \lambda W|\). None (default) auto-selects "eigenvalue" for n <= 2000 else "chebyshev". Other options: "exact", "grid_dense", "grid_sparse", "sparse_spline", "grid_mc", "grid_ilu".

robust : bool, default False

If True, replace the Normal disturbance with Student-t. See Robust regression below.

w_vars : list of str, optional

Names of X columns to spatially lag. By default all non-constant columns are lagged. Pass a subset to restrict which variables receive a spatial lag, e.g. w_vars=["income", "density"].

Notes

Because the spatial autoregression enters only through the disturbance, direct effects equal \(\beta\) and indirect effects equal \(\theta\) (no global spillover multiplier).

Robust regression

When robust=True, the spatially-filtered innovation is Student-t:

\[\varepsilon = (I - \lambda W)(y - X\beta - WX\theta) \sim t_\nu(0, \sigma^2 I)\]

where \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate nu_lam (default 1/30, mean ≈ 30).

__init__(formula=None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None)[source]

Methods

__init__([formula, data, y, X, W, priors, ...])

fit([draws, tune, chains, target_accept, ...])

Draw samples from the posterior.

fitted_values()

Return fitted values at posterior mean parameters.

residuals()

Return residuals on the observed (or transformed-panel) scale.

spatial_diagnostics()

Run Bayesian LM specification tests and return a summary table.

spatial_diagnostics_decision([alpha, ...])

Return a model-selection decision from Bayesian LM test results.

spatial_effects([return_posterior_samples])

Compute Bayesian inference for direct, indirect, and total impacts.

summary([var_names])

Return posterior summary table.

Attributes

inference_data

Return the ArviZ InferenceData from the most recent fit.

pymc_model

Return the PyMC model object built for the most recent fit.

fit(draws=2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, **sample_kwargs)[source]

Draw samples from the posterior. Accepts idata_kwargs for ArviZ compatibility.

Parameters:
idata_kwargs : dict, optional

Passed to pm.sample for InferenceData creation. If contains log_likelihood: True, the complete pointwise log-likelihood (including the Jacobian correction) is attached to the output.

:param Other parameters as in SpatialModel.:

Notes

The log-likelihood for the SDEM model is:

\[\log p(y \mid \theta) = \sum_{i=1}^{n} \log \mathcal{N}(\varepsilon_i \mid 0, \sigma^2) + \log |I - \lambda W |\]

where \(\varepsilon = (I - \lambda W)(y - Z\beta)\) and \(Z = [X, WX]\).

Because the SDEM model uses pm.Potential for both the Gaussian error log-likelihood and the Jacobian on the default (C / Numba) backend, neither term is auto-captured in the log_likelihood group by PyMC. We compute the complete pointwise log-likelihood manually after sampling:

\[\ell_i = -\frac{1}{2}\left(\frac{\varepsilon_i}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi) + \frac{1}{n} \log |I - \lambda W |\]

On JAX backends (nuts_sampler="numpyro" or "blackjax") the same per-observation density is registered via pymc.CustomDist so PyMC populates log_likelihood natively.

fitted_values()[source]

Return fitted values at posterior mean parameters.

property inference_data : arviz.data.inference_data.InferenceData | None[source]

Return the ArviZ InferenceData from the most recent fit.

property pymc_model : pymc.model.core.Model | None[source]

Return the PyMC model object built for the most recent fit.

residuals()[source]

Return residuals on the observed (or transformed-panel) scale.

spatial_diagnostics()[source]

Run Bayesian LM specification tests and return a summary table.

Iterates over the class-level _spatial_diagnostics_tests registry and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type.

Requires the model to have been fit (.fit() called). For cross-sectional models a spatial weights matrix W must also have been supplied at construction time.

Returns:

DataFrame indexed by test name with columns statistic (posterior mean), median, df (degrees of freedom for the \(\chi^2\) reference), p_value (Bayesian p-value 1 - chi2.cdf(mean, df)), and ci_lower / ci_upper (95% credible interval). The DataFrame carries attrs["model_type"] and attrs["n_draws"] metadata.

Return type:

pandas.DataFrame

Raises:
  • RuntimeError – If the model has not been fit yet.

  • ValueError – If a cross-sectional model was constructed without W.

See also

spatial_diagnostics_decision

Model-selection decision based on the test results.

spatial_effects

Posterior inference for direct/indirect/total impacts.

spatial_diagnostics_decision(alpha=0.05, format='graphviz', theme='default')[source]

Return a model-selection decision from Bayesian LM test results.

Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical stge_kb procedure in Anselin et al. [1996]), adapted for panel models following Elhorst [2014] when invoked on a panel subclass. See the cross-sectional / panel-specific docstrings on the leaf classes for the full set of branches consulted.

Parameters:
alpha : float, default 0.05

Significance level for the Bayesian p-values.

format : {"graphviz", "ascii", "model"}, default "graphviz"

Output format. "model" returns the recommended-model name string. "ascii" returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted. "graphviz" returns a graphviz.Digraph object that renders inline in Jupyter; if the optional graphviz package is not installed a UserWarning is issued and the ASCII rendering is returned instead.

Returns:

Recommended model name when format="model", an ASCII tree string when format="ascii", or a graphviz.Digraph when format="graphviz" (with ASCII fallback on missing dep).

Return type:

str or graphviz.Digraph

See also

spatial_diagnostics

Compute the Bayesian LM test statistics.

spatial_effects(return_posterior_samples=False)[source]

Compute Bayesian inference for direct, indirect, and total impacts.

Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values. This is the fully Bayesian analog of the simulation-based approach in LeSage and Pace [2009] and the asymptotic variance formulas in Arbia et al. [2020].

Models without a spatial lag on y do not exhibit global feedback propagation through \((I-\rho W)^{-1}\). However, models with spatially lagged covariates (SLX, SDEM) can still have non-zero neighbour spillovers captured in the indirect term.

Parameters:
return_posterior_samples : bool, optional

If True, return a (DataFrame, dict) tuple where the dict contains the full posterior draws under keys "direct", "indirect", and "total". Default False.

Returns:

If return_posterior_samples is False (default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.

If return_posterior_samples is True, returns (DataFrame, dict) where the dict has keys "direct", "indirect", "total", each mapping to a (G, k) array of posterior draws.

Return type:

pd.DataFrame or tuple of (pd.DataFrame, dict)

summary(var_names=None, **kwargs)[source]

Return posterior summary table.

Parameters:
var_names : list, optional

Variable names to include in the summary.

**kwargs

Additional arguments passed to arviz.summary().

Returns:

Posterior summary statistics.

Return type:

pandas.DataFrame