bayespecon.models.base.SpatialModel¶
-
class bayespecon.models.base.SpatialModel(formula=
None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]¶ Base class for Bayesian spatial regression models. Models follow the notation of [Anselin, 1988] and [LeSage and Pace, 2009]. The API supports both formula and matrix input modes.
- Parameters:¶
- formula : str, optional¶
Wilkinson-style formula string, e.g.
"price ~ poverty + rev_rating". If provided,datamust also be supplied. An intercept is included by default; suppress with"y ~ x - 1".- data : DataFrame or GeoDataFrame, optional¶
Data source when using formula mode.
- y : array-like, optional¶
Dependent variable. Required in matrix mode.
- X : array-like, optional¶
Predictor matrix. Required in matrix mode. If a DataFrame, column names are preserved for labelling.
- W : libpysal.graph.Graph or scipy.sparse matrix¶
Spatial weights matrix of shape
(n, n). Accepts alibpysal.graph.Graph(the modern libpysal graph API) or anyscipy.sparsematrix. The legacylibpysal.weights.Wobject is not accepted directly; passw.sparseto use the underlying sparse matrix, or convert withlibpysal.graph.Graph.from_W(w). W should be row-standardised; aUserWarningis raised if not.- priors : dict, optional¶
Override default priors. Keys depend on the model subclass; see each model’s docstring for supported keys.
- logdet_method : str¶
How to compute
log|I - rho*W|."eigenvalue"(default forn <= 500) pre-computes W’s eigenvalues once and evaluates O(n) per step;"exact"uses symbolic pytensor det (slow forn > 500);"grid_dense"uses dense eigenvalue grid + cubic-spline interpolation (MATLAB-stylelndetfullfor dense W);"grid_sparse"uses sparse-LU grid + cubic-spline interpolation (lndetfullstyle for large sparse W);"sparse_spline"uses sparse-LU + spline on[max(rho_min, 0), rho_max](lndetintstyle);"grid_mc"uses Monte Carlo trace approximation (lndetmc);"grid_ilu"uses ILU-based approximation (lndeticholanalog);"chebyshev"(default forn > 500) uses a Chebyshev polynomial approximation evaluated via Clenshaw’s algorithm. For largenthe Chebyshev coefficients are built from a stochastic trace estimator selected bytrace_estimator.- trace_estimator : {"hutchinson", "hutchpp", "xtrace"}, default "hutchpp"¶
Stochastic trace estimator used to build the Chebyshev coefficients when an eigendecomposition is unavailable. Ignored for non-Chebyshev methods. See
docs/source/user-guide/logdet_profiling.ipynbfor the cost/accuracy frontier.- trace_k : int, optional¶
Number of probe vectors for the trace estimator. Defaults:
30(hutchinson),50(hutchpp),25(xtrace).- robust : bool, default False¶
If True, use a Student-t error distribution instead of Normal, yielding a model that is robust to heavy-tailed outliers. When
robust=True, anu(degrees of freedom) parameter is added to the model with an \(\mathrm{Exp}(\lambda_\nu)\) prior (defaultnu_lam = 1/30, mean ≈ 30). Thenuprior can be controlled via thepriorsdict with keynu_lam.- w_vars : list of str, optional¶
Names of X columns to spatially lag. Only relevant for models that include
WXterms (SLX, SDM, SDEM and their panel/Tobit variants). By default all non-constant columns are lagged. Pass a subset to restrict which variables receive a spatial lag, e.g.w_vars=["income", "density"].
- _spatial_params[source]¶
Spatial autoregressive parameters in the model (e.g.
("rho",)for SAR,("lam",)for SEM). Empty for OLS and SLX.
- _lag_terms[source]¶
Lagged terms present in the model specification (e.g.
("Wy",)for SAR,("WX",)for SLX,("Wy", "WX")for SDM).
- _jacobian_param[source]¶
Name of the parameter that appears in the Jacobian determinant
log|I - param * W|."rho"for SAR/SDM,"lam"for SEM/SDEM,Nonefor OLS/SLX (no Jacobian).
- _has_wx_in_beta[source]¶
Whether the
betacoefficient vector includes WX coefficients (i.e. whether the design matrix is[X, WX]rather than justX). True for SLX, SDM, SDEM.
- _gibbs_class[source]¶
Fully-qualified class name of the Gibbs sampler for this model (e.g.
"GaussianSARGibbs"), orNoneif no Gibbs sampler exists. Used to look up the sampler at runtime to avoid circular imports.
- _model_type[source]¶
Short lowercase model name used as the
model_typeargument to the Gibbs sampler (e.g."sar","sdm"). Also used for InferenceData coordinate labels.
-
__init__(formula=
None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]¶
Methods
__init__([formula, data, y, X, W, priors, ...])fit([draws, tune, chains, target_accept, ...])Draw samples from the posterior.
Return fitted values at posterior mean parameters.
Return residuals on the observed scale.
Run Bayesian LM specification tests and return a summary table.
spatial_diagnostics_decision([alpha, format])Return a model-selection decision from Bayesian LM test results.
spatial_effects([return_posterior_samples])Compute Bayesian inference for direct, indirect, and total impacts.
summary([var_names])Return posterior summary table.
Attributes
Return the ArviZ InferenceData from the most recent fit.
Return the PyMC model object built for the most recent fit.
-
fit(draws=
2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, progressbar=True, **sample_kwargs)[source]¶ Draw samples from the posterior.
- Parameters:¶
- draws : int¶
Number of posterior samples per chain (after tuning).
- tune : int¶
Number of tuning (burn-in) steps per chain.
- chains : int¶
Number of parallel chains.
- target_accept : float¶
Target acceptance rate for NUTS.
- random_seed : int, optional¶
Seed for reproducibility.
- progressbar : bool, default True¶
Show progress bar during sampling.
- **sample_kwargs¶
Additional keyword arguments forwarded to
pm.sample. Passnuts_sampler="blackjax"(or"numpyro","nutpie") to select an alternative NUTS backend; defaults to PyMC’s built-in sampler.
- Return type:¶
arviz.InferenceData
- property inference_data : arviz.data.inference_data.InferenceData | None[source]¶
Return the ArviZ InferenceData from the most recent fit.
- property pymc_model : pymc.model.core.Model | None[source]¶
Return the PyMC model object built for the most recent fit.
For Gibbs-fitted models the PyMC model is not constructed during sampling; it is built lazily on first access so that downstream consumers (e.g. bridge sampling for marginal likelihoods) can evaluate
logpand the prior under the same model definition used by the NUTS path.
- spatial_diagnostics()[source]¶
Run Bayesian LM specification tests and return a summary table.
Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLS model runs LM-Lag, LM-Error, LM-SDM-Joint, and LM-SLX-Error-Joint, while an SAR model runs LM-Error, LM-WX, and Robust-LM-WX.
Requires the model to have been fit (
.fit()called) and a spatial weights matrixWto have been supplied at construction time.- Returns:¶
DataFrame indexed by test name with columns:
Column
Description
statistic
Posterior mean of the LM statistic
median
Posterior median of the LM statistic
df
Degrees of freedom for the \(\chi^2\) reference
p_value
Bayesian p-value:
1 - chi2.cdf(mean, df)ci_lower
Lower bound of 95% credible interval (2.5%)
ci_upper
Upper bound of 95% credible interval (97.5%)
The DataFrame has
attrs["model_type"](class name) andattrs["n_draws"](total posterior draws) metadata.- Return type:¶
pandas.DataFrame
- Raises:¶
RuntimeError – If the model has not been fit yet.
ValueError – If no spatial weights matrix
Wwas supplied.
See also
spatial_diagnostics_decisionModel-selection decision based on the test results.
spatial_effectsPosterior inference for direct/indirect/total impacts.
Examples
>>> ols = OLS(formula="price ~ income + crime", data=df, W=w) >>> ols.fit() >>> ols.spatial_diagnostics() statistic median df p_value ci_lower ci_upper LM-Lag 3.21 2.98 1 0.073 0.12 8.54 LM-Error 5.67 5.34 1 0.017 0.34 12.10 LM-SDM-Joint 7.89 7.12 4 0.096 1.23 18.32 LM-SLX-Error-Joint 6.45 5.98 4 0.168 0.89 15.67
-
spatial_diagnostics_decision(alpha=
0.05, format='graphviz')[source]¶ Return a model-selection decision from Bayesian LM test results.
Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical
stge_kbprocedure in Anselin et al. [1996]). The decision logic depends on the current model type and the pattern of significant tests:From OLS (6-test decision tree):
If only LM-Lag is significant → SAR.
If only LM-Error is significant → SEM.
If both are significant → use the Anselin–Florax / Koley–Bera robust pair: Robust-LM-Lag → SAR, Robust-LM-Error → SEM, both → SARAR. If neither robust test is significant, fall back to the lower raw p-value.
If neither naive test is significant → OLS.
From SAR (3-test decision tree):
LM-Error significant → SARAR; LM-WX significant → SDM; Robust-LM-WX significant → SDM.
From SEM (2-test decision tree):
LM-Lag significant → SARAR; LM-WX significant → SDEM.
From SLX (4-test decision tree):
Robust-LM-Lag-SDM significant → SDM; Robust-LM-Error-SDEM significant → SDEM; both → MANSAR; neither → SLX.
From SDM: LM-Error-SDM significant → MANSAR; else SDM.
From SDEM: LM-Lag-SDEM significant → MANSAR; else SDEM.
- Parameters:¶
- alpha : float, default 0.05¶
Significance level for the Bayesian p-values.
- format : {"graphviz", "ascii", "model"}, default "graphviz"¶
Output format.
"model"returns the recommended-model name string."ascii"returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted."graphviz"returns agraphviz.Digraphobject that renders inline in Jupyter; if the optionalgraphvizpackage is not installed aUserWarningis issued and the ASCII rendering is returned instead.
- Returns:¶
Recommended model name when
format="model", an ASCII tree string whenformat="ascii", or agraphviz.Digraphwhenformat="graphviz"(with ASCII fallback on missing dep).- Return type:¶
str or graphviz.Digraph
See also
spatial_diagnosticsCompute the Bayesian LM test statistics.
References
-
spatial_effects(return_posterior_samples=
False)[source]¶ Compute Bayesian inference for direct, indirect, and total impacts.
Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values. This is the fully Bayesian analog of the simulation-based approach in LeSage and Pace [2009] and the asymptotic variance formulas in Arbia et al. [2020].
Models without a spatial lag on y do not exhibit global feedback propagation through \((I-\rho W)^{-1}\). However, models with spatially lagged covariates (SLX, SDEM) can still have non-zero neighbour spillovers captured in the indirect term.
- Parameters:¶
- Returns:¶
If return_posterior_samples is
False(default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.If return_posterior_samples is
True, returns(DataFrame, dict)where the dict has keys"direct","indirect","total", each mapping to a(G, k)array of posterior draws.- Return type:¶