bayespecon.models.SAR¶
-
class bayespecon.models.SAR(formula=
None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None)[source]¶ Bayesian Spatial Autoregressive (Spatial Lag) model.
Models a contemporaneous spatial dependence in the dependent variable via the autoregressive parameter \(\rho\):
\[y = \rho Wy + X\beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I).\]The likelihood includes the spatial Jacobian \(\log|I - \rho W|\) so that posterior inference on \(\rho\) is exact.
- Parameters:¶
- formula : str, optional¶
Wilkinson-style formula, e.g.
"y ~ x1 + x2". Requiresdata. An intercept is included by default; suppress with"y ~ x - 1".- data : pandas.DataFrame or geopandas.GeoDataFrame, optional¶
Data source for formula mode.
- y : array-like, optional¶
Dependent variable of shape
(n,). Required in matrix mode.- X : array-like or pandas.DataFrame, optional¶
Design matrix. Required in matrix mode. DataFrame columns are preserved as feature names.
- W : libpysal.graph.Graph or scipy.sparse matrix¶
Spatial weights of shape
(n, n). Accepts alibpysal.graph.Graphor anyscipy.sparsematrix. The legacylibpysal.weights.Wobject is not accepted; passw.sparseorlibpysal.graph.Graph.from_W(w). Should be row-standardised; aUserWarningis raised otherwise.- priors : dict, optional¶
Override default priors. Supported keys:
rho_lower(float, default -1.0): Lower bound of the Uniform prior on \(\rho\).rho_upper(float, default 1.0): Upper bound of the Uniform prior on \(\rho\).beta_mu(float, default 0.0): Normal prior mean for \(\beta\).beta_sigma(float, default 1e6): Normal prior std for \(\beta\).sigma_sigma(float, default 10.0): HalfNormal prior std for \(\sigma\).nu_lam(float, default 1/30): Rate of TruncExp(lower=2) prior on \(\nu\) (only used whenrobust=True).
- logdet_method : str, optional¶
How to compute \(\log|I - \rho W|\).
None(default) auto-selects"eigenvalue"forn <= 2000else"chebyshev". Other options:"exact"(symbolic det, slow forn > 500),"grid_dense","grid_sparse","sparse_spline","grid_mc","grid_ilu".- robust : bool, default False¶
If True, replace the Normal error with Student-t for robustness to heavy-tailed outliers. See Robust regression below.
- w_vars : list of str, optional¶
Accepted for API consistency with SLX/SDM/SDEM but unused (SAR has no
WXterm). If supplied without effect on this model.
Notes
Direct, indirect and total effects of \(X\) on \(y\) are derived from the spatial multiplier \((I - \rho W)^{-1}\) and are reported by
spatial_effects().Robust regression
When
robust=True, the error distribution is changed from Normal to Student-t:\[\varepsilon \sim t_\nu(0, \sigma^2 I)\]where \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate
nu_lam(default 1/30, mean ≈ 30, favouring near-Normal tails). The lower bound of 2 ensures the variance exists.-
__init__(formula=
None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None)[source]¶
Methods
__init__([formula, data, y, X, W, priors, ...])fit([draws, tune, chains, target_accept, ...])Draw samples from the posterior.
Return fitted values at posterior mean parameters.
Return residuals on the observed (or transformed-panel) scale.
Run Bayesian LM specification tests and return a summary table.
spatial_diagnostics_decision([alpha, ...])Return a model-selection decision from Bayesian LM test results.
spatial_effects([return_posterior_samples])Compute Bayesian inference for direct, indirect, and total impacts.
summary([var_names])Return posterior summary table.
Attributes
Return the ArviZ InferenceData from the most recent fit.
Return the PyMC model object built for the most recent fit.
-
fit(draws=
2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, **sample_kwargs)[source]¶ Draw samples from the posterior. Accepts
idata_kwargsfor ArviZ compatibility.- Parameters:¶
:param Other parameters as in
SpatialModel.:Notes
The log-likelihood for the SAR model is:
\[\log p(y \mid \theta) = \sum_{i=1}^{n} \log \mathcal{N}(y_i \mid \mu_i, \sigma^2) + \log |I - \rho W |\]The
pm.Normalwithobserved=self._yautomatically captures the first term (the Gaussian log-likelihood) inlog_likelihood. However, the Jacobian term \(\log |I - \rho W|\) is added viapm.Potentialand does not appear in the auto-computedlog_likelihoodgroup.For correct WAIC/LOO computation (and therefore Bayes factor comparison via bridge sampling), we construct the complete pointwise log-likelihood manually after sampling:
\[\ell_i = -\frac{1}{2}\left(\frac{y_i - \mu_i}{\sigma}\right)^2 + \frac{1}{n} \log |I - \rho W |\]where \(\mu_i = \rho (Wy)_i + x_i' \beta\) and the Jacobian contribution is divided by \(n\) so that \(\sum_{i=1}^{n} \ell_i\) equals the total log-likelihood used for sampling.
- property inference_data : arviz.data.inference_data.InferenceData | None[source]¶
Return the ArviZ InferenceData from the most recent fit.
- property pymc_model : pymc.model.core.Model | None[source]¶
Return the PyMC model object built for the most recent fit.
- spatial_diagnostics()[source]¶
Run Bayesian LM specification tests and return a summary table.
Iterates over the class-level
_spatial_diagnostics_testsregistry and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type.Requires the model to have been fit (
.fit()called). For cross-sectional models a spatial weights matrixWmust also have been supplied at construction time.- Returns:¶
DataFrame indexed by test name with columns
statistic(posterior mean),median,df(degrees of freedom for the \(\chi^2\) reference),p_value(Bayesian p-value1 - chi2.cdf(mean, df)), andci_lower/ci_upper(95% credible interval). The DataFrame carriesattrs["model_type"]andattrs["n_draws"]metadata.- Return type:¶
pandas.DataFrame
- Raises:¶
RuntimeError – If the model has not been fit yet.
ValueError – If a cross-sectional model was constructed without
W.
See also
spatial_diagnostics_decisionModel-selection decision based on the test results.
spatial_effectsPosterior inference for direct/indirect/total impacts.
-
spatial_diagnostics_decision(alpha=
0.05, format='graphviz', theme='default')[source]¶ Return a model-selection decision from Bayesian LM test results.
Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical
stge_kbprocedure in Anselin et al. [1996]), adapted for panel models following Elhorst [2014] when invoked on a panel subclass. See the cross-sectional / panel-specific docstrings on the leaf classes for the full set of branches consulted.- Parameters:¶
- alpha : float, default 0.05¶
Significance level for the Bayesian p-values.
- format : {"graphviz", "ascii", "model"}, default "graphviz"¶
Output format.
"model"returns the recommended-model name string."ascii"returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted."graphviz"returns agraphviz.Digraphobject that renders inline in Jupyter; if the optionalgraphvizpackage is not installed aUserWarningis issued and the ASCII rendering is returned instead.
- Returns:¶
Recommended model name when
format="model", an ASCII tree string whenformat="ascii", or agraphviz.Digraphwhenformat="graphviz"(with ASCII fallback on missing dep).- Return type:¶
str or graphviz.Digraph
See also
spatial_diagnosticsCompute the Bayesian LM test statistics.
-
spatial_effects(return_posterior_samples=
False)[source]¶ Compute Bayesian inference for direct, indirect, and total impacts.
Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values. This is the fully Bayesian analog of the simulation-based approach in LeSage and Pace [2009] and the asymptotic variance formulas in Arbia et al. [2020].
Models without a spatial lag on y do not exhibit global feedback propagation through \((I-\rho W)^{-1}\). However, models with spatially lagged covariates (SLX, SDEM) can still have non-zero neighbour spillovers captured in the indirect term.
- Parameters:¶
- Returns:¶
If return_posterior_samples is
False(default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.If return_posterior_samples is
True, returns(DataFrame, dict)where the dict has keys"direct","indirect","total", each mapping to a(G, k)array of posterior draws.- Return type:¶