Getting Started with bayespecon

This notebook walks through a complete spatial econometric workflow:

  1. Simulate spatial data from a known DGP

  2. Fit a baseline OLS model

  3. Run Bayesian LM diagnostics to test for spatial dependence

  4. Use the decision tree to select a better model

  5. Fit the recommended SAR model

  6. Compute direct, indirect, and total spatial effects

  7. Peek at the underlying PyMC model for customization

import libpysal

from bayespecon.dgp import simulate_sar

gdf = simulate_sar(n=20, beta=[1, 0.4, 2.5], rho=0.6, create_gdf=True)
gdf
y X_0 X_1 X_2 geometry
0 -5.231147 1.0 -0.439803 -1.643693 POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0))
1 -1.516225 1.0 -0.469128 -0.041409 POLYGON ((2 0, 2 1, 1 1, 1 0, 2 0))
2 0.955687 1.0 -0.301342 -0.492644 POLYGON ((3 0, 3 1, 2 1, 2 0, 3 0))
3 -2.721767 1.0 0.166715 -1.520066 POLYGON ((4 0, 4 1, 3 1, 3 0, 4 0))
4 0.158253 1.0 -0.419888 -0.056810 POLYGON ((5 0, 5 1, 4 1, 4 0, 5 0))
... ... ... ... ... ...
395 2.219517 1.0 -1.449830 0.472229 POLYGON ((16 19, 16 20, 15 20, 15 19, 16 19))
396 1.903429 1.0 -0.376461 0.061293 POLYGON ((17 19, 17 20, 16 20, 16 19, 17 19))
397 -2.915891 1.0 -0.004232 -0.641885 POLYGON ((18 19, 18 20, 17 20, 17 19, 18 19))
398 -2.475098 1.0 -2.207240 -1.483197 POLYGON ((19 19, 19 20, 18 20, 18 19, 19 19))
399 1.422373 1.0 0.936024 -0.373176 POLYGON ((20 19, 20 20, 19 20, 19 19, 20 19))

400 rows × 5 columns

gdf.plot("X_1").set_title("X1")
Text(0.5, 1.0, 'X1')
../_images/8cc038c5e3c25800306cf497ea4c98676f19bea4520609bea0e78a206ef23f14.png
gdf.plot("y").set_title("y")
Text(0.5, 1.0, 'y')
../_images/43e06e491b6f369579a6622c81b200d4cacefdb1665bf9ff7f696df284b621e3.png

1. Create a Spatial Weights Matrix

Spatial econometric models require a spatial weights matrix \(W\) that encodes the spatial relationships between observations. Here we use Rook contiguity (regions that share a boundary or vertex are neighbors).

G = libpysal.graph.Graph.build_contiguity(gdf, rook=False).transform("r")

2. Fit a Baseline OLS Model

We start with a non-spatial OLS regression to establish a baseline.

\[ y = X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I) \]
# remove the intercept since we have one (-1)
form = "y ~ -1 + X_0 + X_1 + X_2"
from bayespecon import OLS

ols = OLS(formula=form, W=G, data=gdf)
ols.fit(draws=1000, tune=500, chains=2, random_seed=42)
ols.summary()
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, sigma2]
Sampling 2 chains for 500 tune and 1_000 draw iterations (1_000 + 2_000 draws total) took 6 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
/home/runner/micromamba/envs/test/lib/python3.14/site-packages/rich/live.py:260: UserWarning: install "ipywidgets" 
for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
X_0 1.993 0.081 1.829 2.132 0.002 0.002 2247.0 1478.0 1.01
X_1 0.497 0.088 0.324 0.650 0.002 0.002 1890.0 1445.0 1.00
X_2 2.832 0.079 2.683 2.975 0.002 0.002 2452.0 1633.0 1.00
sigma2 2.651 0.188 2.296 2.996 0.004 0.004 1953.0 1616.0 1.00
sigma 1.627 0.058 1.525 1.740 0.001 0.001 1953.0 1616.0 1.00

Notice all the estimated beta parameters are biased upward.

Use arviz to inspect fit

the fit method attaches an inference_data object to each model that can be used directly with arviz functions

import arviz as az

az.plot_forest(ols.inference_data)
array([<Axes: title={'center': '94.0% HDI'}>], dtype=object)
../_images/ab45a9949c80e63302838fcbe1da234e2342fad2077954a0f545b64d901e1a01.png
az.plot_trace(ols.inference_data)
array([[<Axes: title={'center': 'beta'}>,
        <Axes: title={'center': 'beta'}>],
       [<Axes: title={'center': 'sigma2'}>,
        <Axes: title={'center': 'sigma2'}>],
       [<Axes: title={'center': 'sigma'}>,
        <Axes: title={'center': 'sigma'}>]], dtype=object)
../_images/c68f53f22d2634069b1339a81ca3387115ed6d8f789c181e1e13474baf5e4cbb.png

3. Run Bayesian LM Diagnostics

The spatial_diagnostics() method runs a battery of Bayesian LM tests (Doğan, Taşpınar & Bera 2021) that check whether spatial dependence is present in the residuals.

  • LM-Lag: Tests \(H_0: \rho = 0\) (no spatial lag on \(y\))

  • LM-Error: Tests \(H_0: \lambda = 0\) (no spatial error correlation)

  • LM-SDM-Joint: Joint test for \(\rho = 0\) and \(\gamma = 0\)

  • Robust-LM-Lag / Robust-LM-Error: Neyman-orthogonal robust versions

diag = ols.spatial_diagnostics()
diag
statistic median df p_value ci_lower ci_upper
test
LM-Lag 365.127729 363.730347 1 0.000000 282.437440 454.417382
LM-Error 216.777765 215.063816 1 0.000000 164.326858 274.453851
LM-SDM-Joint 385.532946 379.712057 3 0.000000 282.639936 519.655667
LM-SLX-Error-Joint 384.560787 383.399958 3 0.000000 325.160854 450.706092
Robust-LM-Lag 169.827039 168.504134 1 0.000000 127.256387 220.933707
Robust-LM-Error 16.226200 16.099803 1 0.000056 12.158768 21.109210

4. Model Selection Decision Tree

The spatial_diagnostics_decision() method walks a Koley & Bera decision tree using the Bayesian p-values above and recommends the next model to fit.

decision = ols.spatial_diagnostics_decision(alpha=0.05, format="ascii")
print(decision)
LM-Lag *  (p=0.0000, alpha=0.05)
├── <sig> LM-Error *  (p=0.0000, alpha=0.05)
│   ├── <sig> Robust-LM-Lag *  (p=0.0000, alpha=0.05)
│   │   ├── <sig> Robust-LM-Error *  (p=0.0001, alpha=0.05)
│   │   │   ├── <sig> Robust-LM-Lag p <= Robust-LM-Error p *
│   │   │   │   ├── [SAR] * ← SELECTED
│   │   │   │   └── [SEM]
│   │   │   └── [SAR]
│   │   └── <not sig> Robust-LM-Error
│   │       ├── [SEM]
│   │       └── <not sig> LM-Lag p <= LM-Error p
│   │           ├── [SAR]
│   │           └── [SEM]
│   └── [SAR]
└── <not sig> LM-Error
    ├── [SEM]
    └── [OLS]
ols.spatial_diagnostics_decision(alpha=0.05)
../_images/1b36d8a642ec3999a5a8bea5fdc31fd2b781fef5d5a9edce07a39210e555f04a.svg

If the diagnostics suggest spatial dependence, we fit a Spatial Autoregressive (SAR) model:

\[ y = \rho W y + X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I) \]

The likelihood includes the Jacobian \(\log|I - \rho W|\) so that posterior inference on \(\rho\) is exact.

from bayespecon import SAR

sar = SAR(formula=form, W=G, data=gdf)
sar.fit(draws=2000, tune=100, chains=4, random_seed=42)
sar.summary()
/home/runner/micromamba/envs/test/lib/python3.14/site-packages/rich/live.py:260: UserWarning: install "ipywidgets" 
for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Sampling 4 chains for 100 tune and 2000 draw iterations, 4 x 2,100 draws total took 5s (1758 draws/s)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho 0.608 0.028 0.556 0.660 0.000 0.000 7999.0 5938.0 1.0
sigma 1.095 0.039 1.021 1.166 0.000 0.000 7699.0 7595.0 1.0
sigma2 1.200 0.085 1.042 1.359 0.001 0.001 7699.0 7595.0 1.0
X_0 0.969 0.072 0.832 1.101 0.001 0.001 7922.0 7648.0 1.0
X_1 0.402 0.057 0.298 0.512 0.001 0.000 7957.0 7805.0 1.0
X_2 2.549 0.056 2.447 2.656 0.001 0.000 8017.0 7914.0 1.0
az.plot_trace(sar.inference_data)
array([[<Axes: title={'center': 'rho'}>, <Axes: title={'center': 'rho'}>],
       [<Axes: title={'center': 'sigma'}>,
        <Axes: title={'center': 'sigma'}>],
       [<Axes: title={'center': 'sigma2'}>,
        <Axes: title={'center': 'sigma2'}>],
       [<Axes: title={'center': 'beta'}>,
        <Axes: title={'center': 'beta'}>]], dtype=object)
../_images/0d74a3d6d1232840dd5a4b6e4043ead3f187ebbfd2c51f0248b4882af6348029.png

6. Compute Spatial Effects

For the SAR model, a change in \(X\) propagates through the spatial multiplier \((I - \rho W)^{-1}\). The effects decompose into:

  • Direct: Effect of \(X_i\) on \(y_i\) (own-region)

  • Indirect: Effect of \(X_j\) on \(y_i\) for \(j \neq i\) (spillover)

  • Total: Direct + Indirect

effects = sar.spatial_effects()
effects
direct direct_ci_lower direct_ci_upper direct_pvalue indirect indirect_ci_lower indirect_ci_upper indirect_pvalue total total_ci_lower total_ci_upper total_pvalue
variable
X_1 0.434117 0.313930 0.556789 0.0 0.597965 0.403382 0.838954 0.0 1.032082 0.725351 1.378414 0.0
X_2 2.751963 2.624186 2.881137 0.0 3.790026 2.979504 4.741857 0.0 6.541989 5.653011 7.589003 0.0