bayespecon.SAR¶
-
class bayespecon.SAR(formula=
None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]¶ Bayesian Spatial Autoregressive (Spatial Lag) model.
Models a contemporaneous spatial dependence in the dependent variable via the autoregressive parameter \(\rho\):
\[y = \rho Wy + X\beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I).\]The likelihood includes the spatial Jacobian \(\log|I - \rho W|\) so that posterior inference on \(\rho\) is exact.
- Parameters:¶
- formula : str, optional¶
Wilkinson-style formula, e.g.
"y ~ x1 + x2". Requiresdata. An intercept is included by default; suppress with"y ~ x - 1".- data : pandas.DataFrame or geopandas.GeoDataFrame, optional¶
Data source for formula mode.
- y : array-like, optional¶
Dependent variable of shape
(n,). Required in matrix mode.- X : array-like or pandas.DataFrame, optional¶
Design matrix. Required in matrix mode. DataFrame columns are preserved as feature names.
- W : libpysal.graph.Graph or scipy.sparse matrix¶
Spatial weights of shape
(n, n). Accepts alibpysal.graph.Graphor anyscipy.sparsematrix. The legacylibpysal.weights.Wobject is not accepted; passw.sparseorlibpysal.graph.Graph.from_W(w). Should be row-standardised; aUserWarningis raised otherwise.- priors : dict, optional¶
Override default priors. Supported keys:
rho_lower(float, default -1.0): Lower bound of the Uniform prior on \(\rho\).rho_upper(float, default 1.0): Upper bound of the Uniform prior on \(\rho\).beta_mu(float, default 0.0): Normal prior mean for \(\beta\).beta_sigma(float, default 1e6): Normal prior std for \(\beta\).sigma2_alpha(float, default 2.0): Shape of the InverseGamma prior on \(\sigma^2\).sigma2_beta(float, defaultVar(y)): Scale of the InverseGamma prior on \(\sigma^2\).nu_lam(float, default 1/30): Rate of TruncExp(lower=2) prior on \(\nu\) (only used whenrobust=True).
- logdet_method : str, optional¶
How to compute \(\log|I - \rho W|\).
None(default) auto-selects"eigenvalue"forn <= 2000else"chebyshev". Other options:"exact"(symbolic det, slow forn > 500),"dense_grid","sparse_grid","spline","mc","ilu".- robust : bool, default False¶
If True, replace the Normal error with Student-t for robustness to heavy-tailed outliers. See Robust regression below.
- w_vars : list of str, optional¶
Accepted for API consistency with SLX/SDM/SDEM but unused (SAR has no
WXterm). If supplied without effect on this model.
Notes
Direct, indirect and total effects of \(X\) on \(y\) are derived from the spatial multiplier \((I - \rho W)^{-1}\) and are reported by
spatial_effects().Robust regression
When
robust=True, the error distribution is changed from Normal to Student-t:\[\varepsilon \sim t_\nu(0, \sigma^2 I)\]where \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate
nu_lam(default 1/30, mean ≈ 30, favouring near-Normal tails). The lower bound of 2 ensures the variance exists.-
__init__(formula=
None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]¶
Methods
__init__([formula, data, y, X, W, priors, ...])fit([draws, tune, chains, target_accept, ...])Draw samples from the posterior.
Return fitted values at posterior mean parameters.
Return residuals on the observed scale.
Run Bayesian LM specification tests and return a summary table.
spatial_diagnostics_decision([alpha, format])Return a model-selection decision from Bayesian LM test results.
spatial_effects([return_posterior_samples])Compute Bayesian inference for direct, indirect, and total impacts.
summary([var_names])Return posterior summary table.
Attributes
Return the ArviZ InferenceData from the most recent fit.
Return the PyMC model object built for the most recent fit.
-
fit(draws=
2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, sampler='gibbs', thin=1, n_jobs=-1, progressbar=True, **sample_kwargs)[source]¶ Draw samples from the posterior.
- Parameters:¶
- draws : int, default 2000¶
Number of posterior samples per chain (after tuning).
- tune : int, default 1000¶
Number of tuning (burn-in) steps per chain.
- chains : int, default 4¶
Number of parallel chains.
- target_accept : float, default 0.9¶
Target acceptance rate for NUTS.
- random_seed : int, optional¶
Seed for reproducibility.
- idata_kwargs : dict, optional¶
Passed to
pm.samplefor InferenceData creation. If containslog_likelihood: True, the complete pointwise log-likelihood (including the Jacobian correction) is attached to the output. Only used whensampler="nuts".- sampler : str, default "nuts"¶
Sampling method:
"nuts": NUTS via PyMC (default)."gibbs": 3-block Gibbs sampler (β conjugate normal, σ² conjugate Inv-Γ, ρ collapsed slice). Faster for Gaussian models because it avoids the banana-shaped posterior geometry that NUTS struggles with.
- thin : int, default 1¶
Keep every
thin-th draw after warmup. Only used whensampler="gibbs".- n_jobs : int, default -1¶
Number of parallel workers for Gibbs chains.
-1uses all CPUs. Whenn_jobs=1, chains run sequentially with progress bars. Whenn_jobs>1(or-1), chains run in parallel viajoblib. Only used whensampler="gibbs"withgibbs_method="numpy".- progressbar : bool, default True¶
Show per-chain progress bars. Only used when
sampler="gibbs".- **sample_kwargs¶
Additional keyword arguments forwarded to
pm.sample. Only used whensampler="nuts".
Notes
The log-likelihood for the SAR model is:
\[\log p(y \mid \theta) = \sum_{i=1}^{n} \log \mathcal{N}(y_i \mid \mu_i, \sigma^2) + \log |I - \rho W |\]The
pm.Normalwithobserved=self._yautomatically captures the first term (the Gaussian log-likelihood) inlog_likelihood. However, the Jacobian term \(\log |I - \rho W|\) is added viapm.Potentialand does not appear in the auto-computedlog_likelihoodgroup.For correct WAIC/LOO computation (and therefore Bayes factor comparison via bridge sampling), we construct the complete pointwise log-likelihood manually after sampling:
\[\ell_i = -\frac{1}{2}\left(\frac{y_i - \mu_i}{\sigma}\right)^2 + \frac{1}{n} \log |I - \rho W |\]where \(\mu_i = \rho (Wy)_i + x_i' \beta\) and the Jacobian contribution is divided by \(n\) so that \(\sum_{i=1}^{n} \ell_i\) equals the total log-likelihood used for sampling.
- property inference_data : arviz.data.inference_data.InferenceData | None[source]¶
Return the ArviZ InferenceData from the most recent fit.
- property pymc_model : pymc.model.core.Model | None[source]¶
Return the PyMC model object built for the most recent fit.
For Gibbs-fitted models the PyMC model is not constructed during sampling; it is built lazily on first access so that downstream consumers (e.g. bridge sampling for marginal likelihoods) can evaluate
logpand the prior under the same model definition used by the NUTS path.
- spatial_diagnostics()[source]¶
Run Bayesian LM specification tests and return a summary table.
Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLS model runs LM-Lag, LM-Error, LM-SDM-Joint, and LM-SLX-Error-Joint, while an SAR model runs LM-Error, LM-WX, and Robust-LM-WX.
Requires the model to have been fit (
.fit()called) and a spatial weights matrixWto have been supplied at construction time.- Returns:¶
DataFrame indexed by test name with columns:
Column
Description
statistic
Posterior mean of the LM statistic
median
Posterior median of the LM statistic
df
Degrees of freedom for the \(\chi^2\) reference
p_value
Bayesian p-value:
1 - chi2.cdf(mean, df)ci_lower
Lower bound of 95% credible interval (2.5%)
ci_upper
Upper bound of 95% credible interval (97.5%)
The DataFrame has
attrs["model_type"](class name) andattrs["n_draws"](total posterior draws) metadata.- Return type:¶
pandas.DataFrame
- Raises:¶
RuntimeError – If the model has not been fit yet.
ValueError – If no spatial weights matrix
Wwas supplied.
See also
spatial_diagnostics_decisionModel-selection decision based on the test results.
spatial_effectsPosterior inference for direct/indirect/total impacts.
Examples
>>> ols = OLS(formula="price ~ income + crime", data=df, W=w) >>> ols.fit() >>> ols.spatial_diagnostics() statistic median df p_value ci_lower ci_upper LM-Lag 3.21 2.98 1 0.073 0.12 8.54 LM-Error 5.67 5.34 1 0.017 0.34 12.10 LM-SDM-Joint 7.89 7.12 4 0.096 1.23 18.32 LM-SLX-Error-Joint 6.45 5.98 4 0.168 0.89 15.67
-
spatial_diagnostics_decision(alpha=
0.05, format='graphviz')[source]¶ Return a model-selection decision from Bayesian LM test results.
Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical
stge_kbprocedure in Anselin et al. [1996]). The decision logic depends on the current model type and the pattern of significant tests:From OLS (6-test decision tree):
If only LM-Lag is significant → SAR.
If only LM-Error is significant → SEM.
If both are significant → use the Anselin–Florax / Koley–Bera robust pair: Robust-LM-Lag → SAR, Robust-LM-Error → SEM, both → SARAR. If neither robust test is significant, fall back to the lower raw p-value.
If neither naive test is significant → OLS.
From SAR (3-test decision tree):
LM-Error significant → SARAR; LM-WX significant → SDM; Robust-LM-WX significant → SDM.
From SEM (2-test decision tree):
LM-Lag significant → SARAR; LM-WX significant → SDEM.
From SLX (4-test decision tree):
Robust-LM-Lag-SDM significant → SDM; Robust-LM-Error-SDEM significant → SDEM; both → MANSAR; neither → SLX.
From SDM: LM-Error-SDM significant → MANSAR; else SDM.
From SDEM: LM-Lag-SDEM significant → MANSAR; else SDEM.
- Parameters:¶
- alpha : float, default 0.05¶
Significance level for the Bayesian p-values.
- format : {"graphviz", "ascii", "model"}, default "graphviz"¶
Output format.
"model"returns the recommended-model name string."ascii"returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted."graphviz"returns agraphviz.Digraphobject that renders inline in Jupyter; if the optionalgraphvizpackage is not installed aUserWarningis issued and the ASCII rendering is returned instead.
- Returns:¶
Recommended model name when
format="model", an ASCII tree string whenformat="ascii", or agraphviz.Digraphwhenformat="graphviz"(with ASCII fallback on missing dep).- Return type:¶
str or graphviz.Digraph
See also
spatial_diagnosticsCompute the Bayesian LM test statistics.
References
-
spatial_effects(return_posterior_samples=
False)[source]¶ Compute Bayesian inference for direct, indirect, and total impacts.
Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values. This is the fully Bayesian analog of the simulation-based approach in LeSage and Pace [2009] and the asymptotic variance formulas in Arbia et al. [2020].
Models without a spatial lag on y do not exhibit global feedback propagation through \((I-\rho W)^{-1}\). However, models with spatially lagged covariates (SLX, SDEM) can still have non-zero neighbour spillovers captured in the indirect term.
- Parameters:¶
- Returns:¶
If return_posterior_samples is
False(default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.If return_posterior_samples is
True, returns(DataFrame, dict)where the dict has keys"direct","indirect","total", each mapping to a(G, k)array of posterior draws.- Return type:¶