bayespecon.models.base.SpatialModel

class bayespecon.models.base.SpatialModel(formula=None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]

Base class for Bayesian spatial regression models. Models follow the notation of [Anselin, 1988] and [LeSage and Pace, 2009]. The API supports both formula and matrix input modes.

Parameters:
formula : str, optional

Wilkinson-style formula string, e.g. "price ~ poverty + rev_rating". If provided, data must also be supplied. An intercept is included by default; suppress with "y ~ x - 1".

data : DataFrame or GeoDataFrame, optional

Data source when using formula mode.

y : array-like, optional

Dependent variable. Required in matrix mode.

X : array-like, optional

Predictor matrix. Required in matrix mode. If a DataFrame, column names are preserved for labelling.

W : libpysal.graph.Graph or scipy.sparse matrix

Spatial weights matrix of shape (n, n). Accepts a libpysal.graph.Graph (the modern libpysal graph API) or any scipy.sparse matrix. The legacy libpysal.weights.W object is not accepted directly; pass w.sparse to use the underlying sparse matrix, or convert with libpysal.graph.Graph.from_W(w). W should be row-standardised; a UserWarning is raised if not.

priors : dict, optional

Override default priors. Keys depend on the model subclass; see each model’s docstring for supported keys.

logdet_method : str

How to compute log|I - rho*W|. "eigenvalue" (default for n <= 500) pre-computes W’s eigenvalues once and evaluates O(n) per step; "exact" uses symbolic pytensor det (slow for n > 500); "grid_dense" uses dense eigenvalue grid + cubic-spline interpolation (MATLAB-style lndetfull for dense W); "grid_sparse" uses sparse-LU grid + cubic-spline interpolation (lndetfull style for large sparse W); "sparse_spline" uses sparse-LU + spline on [max(rho_min, 0), rho_max] (lndetint style); "grid_mc" uses Monte Carlo trace approximation (lndetmc); "grid_ilu" uses ILU-based approximation (lndetichol analog); "chebyshev" (default for n > 500) uses a Chebyshev polynomial approximation evaluated via Clenshaw’s algorithm. For large n the Chebyshev coefficients are built from a stochastic trace estimator selected by trace_estimator.

trace_estimator : {"hutchinson", "hutchpp", "xtrace"}, default "hutchpp"

Stochastic trace estimator used to build the Chebyshev coefficients when an eigendecomposition is unavailable. Ignored for non-Chebyshev methods. See docs/source/user-guide/logdet_profiling.ipynb for the cost/accuracy frontier.

trace_k : int, optional

Number of probe vectors for the trace estimator. Defaults: 30 (hutchinson), 50 (hutchpp), 25 (xtrace).

robust : bool, default False

If True, use a Student-t error distribution instead of Normal, yielding a model that is robust to heavy-tailed outliers. When robust=True, a nu (degrees of freedom) parameter is added to the model with an \(\mathrm{Exp}(\lambda_\nu)\) prior (default nu_lam = 1/30, mean ≈ 30). The nu prior can be controlled via the priors dict with key nu_lam.

w_vars : list of str, optional

Names of X columns to spatially lag. Only relevant for models that include WX terms (SLX, SDM, SDEM and their panel/Tobit variants). By default all non-constant columns are lagged. Pass a subset to restrict which variables receive a spatial lag, e.g. w_vars=["income", "density"].

_spatial_params[source]

Spatial autoregressive parameters in the model (e.g. ("rho",) for SAR, ("lam",) for SEM). Empty for OLS and SLX.

Type:

tuple[str, …]

_lag_terms[source]

Lagged terms present in the model specification (e.g. ("Wy",) for SAR, ("WX",) for SLX, ("Wy", "WX") for SDM).

Type:

tuple[str, …]

_jacobian_param[source]

Name of the parameter that appears in the Jacobian determinant log|I - param * W|. "rho" for SAR/SDM, "lam" for SEM/SDEM, None for OLS/SLX (no Jacobian).

Type:

str or None

_has_wx_in_beta[source]

Whether the beta coefficient vector includes WX coefficients (i.e. whether the design matrix is [X, WX] rather than just X). True for SLX, SDM, SDEM.

Type:

bool

_gibbs_class[source]

Fully-qualified class name of the Gibbs sampler for this model (e.g. "GaussianSARGibbs"), or None if no Gibbs sampler exists. Used to look up the sampler at runtime to avoid circular imports.

Type:

str or None

_model_type[source]

Short lowercase model name used as the model_type argument to the Gibbs sampler (e.g. "sar", "sdm"). Also used for InferenceData coordinate labels.

Type:

str

__init__(formula=None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]

Methods

__init__([formula, data, y, X, W, priors, ...])

fit([draws, tune, chains, target_accept, ...])

Draw samples from the posterior.

fitted_values()

Return fitted values at posterior mean parameters.

residuals()

Return residuals on the observed scale.

spatial_diagnostics()

Run Bayesian LM specification tests and return a summary table.

spatial_diagnostics_decision([alpha, format])

Return a model-selection decision from Bayesian LM test results.

spatial_effects([return_posterior_samples])

Compute Bayesian inference for direct, indirect, and total impacts.

summary([var_names])

Return posterior summary table.

Attributes

inference_data

Return the ArviZ InferenceData from the most recent fit.

pymc_model

Return the PyMC model object built for the most recent fit.

fit(draws=2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, progressbar=True, **sample_kwargs)[source]

Draw samples from the posterior.

Parameters:
draws : int

Number of posterior samples per chain (after tuning).

tune : int

Number of tuning (burn-in) steps per chain.

chains : int

Number of parallel chains.

target_accept : float

Target acceptance rate for NUTS.

random_seed : int, optional

Seed for reproducibility.

progressbar : bool, default True

Show progress bar during sampling.

**sample_kwargs

Additional keyword arguments forwarded to pm.sample. Pass nuts_sampler="blackjax" (or "numpyro", "nutpie") to select an alternative NUTS backend; defaults to PyMC’s built-in sampler.

Return type:

arviz.InferenceData

fitted_values()[source]

Return fitted values at posterior mean parameters.

Returns:

Posterior-mean fitted values.

Return type:

np.ndarray

property inference_data : arviz.data.inference_data.InferenceData | None[source]

Return the ArviZ InferenceData from the most recent fit.

Returns:

The inference data object, or None if the model has not been fit yet.

Return type:

arviz.InferenceData or None

property pymc_model : pymc.model.core.Model | None[source]

Return the PyMC model object built for the most recent fit.

For Gibbs-fitted models the PyMC model is not constructed during sampling; it is built lazily on first access so that downstream consumers (e.g. bridge sampling for marginal likelihoods) can evaluate logp and the prior under the same model definition used by the NUTS path.

Returns:

The model object used by fit(), or None if the instance has not been fit yet.

Return type:

pymc.Model or None

residuals()[source]

Return residuals on the observed scale.

Returns:

Residual vector y - fitted_values.

Return type:

np.ndarray

spatial_diagnostics()[source]

Run Bayesian LM specification tests and return a summary table.

Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLS model runs LM-Lag, LM-Error, LM-SDM-Joint, and LM-SLX-Error-Joint, while an SAR model runs LM-Error, LM-WX, and Robust-LM-WX.

Requires the model to have been fit (.fit() called) and a spatial weights matrix W to have been supplied at construction time.

Returns:

DataFrame indexed by test name with columns:

Column

Description

statistic

Posterior mean of the LM statistic

median

Posterior median of the LM statistic

df

Degrees of freedom for the \(\chi^2\) reference

p_value

Bayesian p-value: 1 - chi2.cdf(mean, df)

ci_lower

Lower bound of 95% credible interval (2.5%)

ci_upper

Upper bound of 95% credible interval (97.5%)

The DataFrame has attrs["model_type"] (class name) and attrs["n_draws"] (total posterior draws) metadata.

Return type:

pandas.DataFrame

Raises:
  • RuntimeError – If the model has not been fit yet.

  • ValueError – If no spatial weights matrix W was supplied.

See also

spatial_diagnostics_decision

Model-selection decision based on the test results.

spatial_effects

Posterior inference for direct/indirect/total impacts.

Examples

>>> ols = OLS(formula="price ~ income + crime", data=df, W=w)
>>> ols.fit()
>>> ols.spatial_diagnostics()
                 statistic  median  df  p_value  ci_lower  ci_upper
LM-Lag                3.21    2.98   1    0.073      0.12      8.54
LM-Error              5.67    5.34   1    0.017      0.34     12.10
LM-SDM-Joint          7.89    7.12   4    0.096      1.23     18.32
LM-SLX-Error-Joint    6.45    5.98   4    0.168      0.89     15.67
spatial_diagnostics_decision(alpha=0.05, format='graphviz')[source]

Return a model-selection decision from Bayesian LM test results.

Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical stge_kb procedure in Anselin et al. [1996]). The decision logic depends on the current model type and the pattern of significant tests:

From OLS (6-test decision tree):

  1. If only LM-Lag is significant → SAR.

  2. If only LM-Error is significant → SEM.

  3. If both are significant → use the Anselin–Florax / Koley–Bera robust pair: Robust-LM-Lag → SAR, Robust-LM-Error → SEM, both → SARAR. If neither robust test is significant, fall back to the lower raw p-value.

  4. If neither naive test is significant → OLS.

From SAR (3-test decision tree):

  • LM-Error significant → SARAR; LM-WX significant → SDM; Robust-LM-WX significant → SDM.

From SEM (2-test decision tree):

  • LM-Lag significant → SARAR; LM-WX significant → SDEM.

From SLX (4-test decision tree):

  • Robust-LM-Lag-SDM significant → SDM; Robust-LM-Error-SDEM significant → SDEM; both → MANSAR; neither → SLX.

From SDM: LM-Error-SDM significant → MANSAR; else SDM.

From SDEM: LM-Lag-SDEM significant → MANSAR; else SDEM.

Parameters:
alpha : float, default 0.05

Significance level for the Bayesian p-values.

format : {"graphviz", "ascii", "model"}, default "graphviz"

Output format. "model" returns the recommended-model name string. "ascii" returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted. "graphviz" returns a graphviz.Digraph object that renders inline in Jupyter; if the optional graphviz package is not installed a UserWarning is issued and the ASCII rendering is returned instead.

Returns:

Recommended model name when format="model", an ASCII tree string when format="ascii", or a graphviz.Digraph when format="graphviz" (with ASCII fallback on missing dep).

Return type:

str or graphviz.Digraph

See also

spatial_diagnostics

Compute the Bayesian LM test statistics.

References

Koley and Bera [2024], Anselin et al. [1996]

spatial_effects(return_posterior_samples=False)[source]

Compute Bayesian inference for direct, indirect, and total impacts.

Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values. This is the fully Bayesian analog of the simulation-based approach in LeSage and Pace [2009] and the asymptotic variance formulas in Arbia et al. [2020].

Models without a spatial lag on y do not exhibit global feedback propagation through \((I-\rho W)^{-1}\). However, models with spatially lagged covariates (SLX, SDEM) can still have non-zero neighbour spillovers captured in the indirect term.

Parameters:
return_posterior_samples : bool, optional

If True, return a (DataFrame, dict) tuple where the dict contains the full posterior draws under keys "direct", "indirect", and "total". Default False.

Returns:

If return_posterior_samples is False (default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.

If return_posterior_samples is True, returns (DataFrame, dict) where the dict has keys "direct", "indirect", "total", each mapping to a (G, k) array of posterior draws.

Return type:

pd.DataFrame or tuple of (pd.DataFrame, dict)

summary(var_names=None, **kwargs)[source]

Return posterior summary table.

Parameters:
var_names : list, optional

Variable names to include in the summary.

**kwargs

Additional arguments passed to arviz.summary().

Returns:

Posterior summary statistics.

Return type:

pandas.DataFrame