bayespecon.SAR¶

class bayespecon.SAR(formula=None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]¶

Bayesian Spatial Autoregressive (Spatial Lag) model.

Models a contemporaneous spatial dependence in the dependent variable via the autoregressive parameter \(\rho\):

\[y = \rho Wy + X\beta + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2 I).\]

The likelihood includes the spatial Jacobian \(\log|I - \rho W|\) so that posterior inference on \(\rho\) is exact.

Parameters:¶

formula : str, optional¶

Wilkinson-style formula, e.g. "y ~ x1 + x2". Requires data. An intercept is included by default; suppress with "y ~ x - 1".

data : pandas.DataFrame or geopandas.GeoDataFrame, optional¶

Data source for formula mode.

y : array-like, optional¶

Dependent variable of shape (n,). Required in matrix mode.

X : array-like or pandas.DataFrame, optional¶

Design matrix. Required in matrix mode. DataFrame columns are preserved as feature names.

W : libpysal.graph.Graph or scipy.sparse matrix¶

Spatial weights of shape (n, n). Accepts a libpysal.graph.Graph or any scipy.sparse matrix. The legacy libpysal.weights.W object is not accepted; pass w.sparse or libpysal.graph.Graph.from_W(w). Should be row-standardised; a UserWarning is raised otherwise.

priors : dict, optional¶

Override default priors. Supported keys:

rho_lower (float, default -1.0): Lower bound of the Uniform prior on \(\rho\).
rho_upper (float, default 1.0): Upper bound of the Uniform prior on \(\rho\).
beta_mu (float, default 0.0): Normal prior mean for \(\beta\).
beta_sigma (float, default 1e6): Normal prior std for \(\beta\).
sigma2_alpha (float, default 2.0): Shape of the InverseGamma prior on \(\sigma^2\).
sigma2_beta (float, default Var(y)): Scale of the InverseGamma prior on \(\sigma^2\).
nu_lam (float, default 1/30): Rate of TruncExp(lower=2) prior on \(\nu\) (only used when robust=True).

logdet_method : str, optional¶

How to compute \(\log|I - \rho W|\). None (default) auto-selects "eigenvalue" for n <= 2000 else "chebyshev". Other options: "exact" (symbolic det, slow for n > 500), "dense_grid", "sparse_grid", "spline", "mc", "ilu".

robust : bool, default False¶

If True, replace the Normal error with Student-t for robustness to heavy-tailed outliers. See Robust regression below.

w_vars : list of str, optional¶

Accepted for API consistency with SLX/SDM/SDEM but unused (SAR has no WX term). If supplied without effect on this model.

Notes

Direct, indirect and total effects of \(X\) on \(y\) are derived from the spatial multiplier \((I - \rho W)^{-1}\) and are reported by spatial_effects().

Robust regression

When robust=True, the error distribution is changed from Normal to Student-t:

\[\varepsilon \sim t_\nu(0, \sigma^2 I)\]

where \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate nu_lam (default 1/30, mean ≈ 30, favouring near-Normal tails). The lower bound of 2 ensures the variance exists.

__init__(formula=None, data=None, y=None, X=None, W=None, priors=None, logdet_method=None, robust=False, w_vars=None, backend=None, trace_estimator='hutchpp', trace_k=None)[source]¶

Methods

`__init__`([formula, data, y, X, W, priors, ...])
`fit`([draws, tune, chains, target_accept, ...])	Draw samples from the posterior.
`fitted_values`()	Return fitted values at posterior mean parameters.
`residuals`()	Return residuals on the observed scale.
`spatial_diagnostics`()	Run Bayesian LM specification tests and return a summary table.
`spatial_diagnostics_decision`([alpha, format])	Return a model-selection decision from Bayesian LM test results.
`spatial_effects`([return_posterior_samples])	Compute Bayesian inference for direct, indirect, and total impacts.
`summary`([var_names])	Return posterior summary table.

Attributes

`inference_data`	Return the ArviZ InferenceData from the most recent fit.
`pymc_model`	Return the PyMC model object built for the most recent fit.

fit(draws=2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, sampler='gibbs', thin=1, n_jobs=-1, progressbar=True, **sample_kwargs)[source]¶

Draw samples from the posterior.

Parameters:¶

draws : int, default 2000¶

Number of posterior samples per chain (after tuning).

tune : int, default 1000¶

Number of tuning (burn-in) steps per chain.

chains : int, default 4¶

Number of parallel chains.

target_accept : float, default 0.9¶

Target acceptance rate for NUTS.

random_seed : int, optional¶

Seed for reproducibility.

idata_kwargs : dict, optional¶

Passed to pm.sample for InferenceData creation. If contains log_likelihood: True, the complete pointwise log-likelihood (including the Jacobian correction) is attached to the output. Only used when sampler="nuts".

sampler : str, default "nuts"¶

Sampling method:

"nuts": NUTS via PyMC (default).
"gibbs": 3-block Gibbs sampler (β conjugate normal, σ² conjugate Inv-Γ, ρ collapsed slice). Faster for Gaussian models because it avoids the banana-shaped posterior geometry that NUTS struggles with.

thin : int, default 1¶

Keep every thin-th draw after warmup. Only used when sampler="gibbs".

n_jobs : int, default -1¶

Number of parallel workers for Gibbs chains. -1 uses all CPUs. When n_jobs=1, chains run sequentially with progress bars. When n_jobs>1 (or -1), chains run in parallel via joblib. Only used when sampler="gibbs" with gibbs_method="numpy".

progressbar : bool, default True¶

Show per-chain progress bars. Only used when sampler="gibbs".

**sample_kwargs¶

Additional keyword arguments forwarded to pm.sample. Only used when sampler="nuts".

Notes

The log-likelihood for the SAR model is:

\[\log p(y \mid \theta) = \sum_{i=1}^{n} \log \mathcal{N}(y_i \mid \mu_i, \sigma^2) + \log |I - \rho W |\]

The pm.Normal with observed=self._y automatically captures the first term (the Gaussian log-likelihood) in log_likelihood. However, the Jacobian term \(\log |I - \rho W|\) is added via pm.Potential and does not appear in the auto-computed log_likelihood group.

For correct WAIC/LOO computation (and therefore Bayes factor comparison via bridge sampling), we construct the complete pointwise log-likelihood manually after sampling:

\[\ell_i = -\frac{1}{2}\left(\frac{y_i - \mu_i}{\sigma}\right)^2 + \frac{1}{n} \log |I - \rho W |\]

where \(\mu_i = \rho (Wy)_i + x_i' \beta\) and the Jacobian contribution is divided by \(n\) so that \(\sum_{i=1}^{n} \ell_i\) equals the total log-likelihood used for sampling.

fitted_values()[source]¶

Return fitted values at posterior mean parameters.

Returns:¶: Posterior-mean fitted values.
Return type:¶: np.ndarray

property inference_data : arviz.data.inference_data.InferenceData | None[source]¶

Return the ArviZ InferenceData from the most recent fit.

Returns:¶: The inference data object, or None if the model has not been fit yet.
Return type:¶: arviz.InferenceData or None

property pymc_model : pymc.model.core.Model | None[source]¶

Return the PyMC model object built for the most recent fit.

For Gibbs-fitted models the PyMC model is not constructed during sampling; it is built lazily on first access so that downstream consumers (e.g. bridge sampling for marginal likelihoods) can evaluate logp and the prior under the same model definition used by the NUTS path.

Returns:¶: The model object used by fit(), or None if the instance has not been fit yet.
Return type:¶: pymc.Model or None

residuals()[source]¶

Return residuals on the observed scale.

Returns:¶: Residual vector y - fitted_values.
Return type:¶: np.ndarray

spatial_diagnostics()[source]¶

Run Bayesian LM specification tests and return a summary table.

Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLS model runs LM-Lag, LM-Error, LM-SDM-Joint, and LM-SLX-Error-Joint, while an SAR model runs LM-Error, LM-WX, and Robust-LM-WX.

Requires the model to have been fit (.fit() called) and a spatial weights matrix W to have been supplied at construction time.

Returns:¶

DataFrame indexed by test name with columns:

Column	Description
statistic	Posterior mean of the LM statistic
median	Posterior median of the LM statistic
df	Degrees of freedom for the \(\chi^2\) reference
p_value	Bayesian p-value: `1 - chi2.cdf(mean, df)`
ci_lower	Lower bound of 95% credible interval (2.5%)
ci_upper	Upper bound of 95% credible interval (97.5%)

The DataFrame has attrs["model_type"] (class name) and attrs["n_draws"] (total posterior draws) metadata.

Return type:¶

pandas.DataFrame

Raises:¶

RuntimeError – If the model has not been fit yet.
ValueError – If no spatial weights matrix W was supplied.