Getting Started with `bayespecon`¶

This notebook walks through a complete spatial econometric workflow:

Simulate spatial data from a known DGP
Fit a baseline OLS model
Run Bayesian LM diagnostics to test for spatial dependence
Use the decision tree to select a better model
Fit the recommended SAR model
Compute direct, indirect, and total spatial effects
Peek at the underlying PyMC model for customization

import libpysal

from bayespecon.dgp import simulate_sar

gdf = simulate_sar(n=20, beta=[1, 0.4, 2.5], rho=0.6, create_gdf=True)

gdf

	y	X_0	X_1	X_2	geometry
0	-5.231147	1.0	-0.439803	-1.643693	POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0))
1	-1.516225	1.0	-0.469128	-0.041409	POLYGON ((2 0, 2 1, 1 1, 1 0, 2 0))
2	0.955687	1.0	-0.301342	-0.492644	POLYGON ((3 0, 3 1, 2 1, 2 0, 3 0))
3	-2.721767	1.0	0.166715	-1.520066	POLYGON ((4 0, 4 1, 3 1, 3 0, 4 0))
4	0.158253	1.0	-0.419888	-0.056810	POLYGON ((5 0, 5 1, 4 1, 4 0, 5 0))
...	...	...	...	...	...
395	2.219517	1.0	-1.449830	0.472229	POLYGON ((16 19, 16 20, 15 20, 15 19, 16 19))
396	1.903429	1.0	-0.376461	0.061293	POLYGON ((17 19, 17 20, 16 20, 16 19, 17 19))
397	-2.915891	1.0	-0.004232	-0.641885	POLYGON ((18 19, 18 20, 17 20, 17 19, 18 19))
398	-2.475098	1.0	-2.207240	-1.483197	POLYGON ((19 19, 19 20, 18 20, 18 19, 19 19))
399	1.422373	1.0	0.936024	-0.373176	POLYGON ((20 19, 20 20, 19 20, 19 19, 20 19))

400 rows × 5 columns

gdf.plot("X_1").set_title("X1")

Text(0.5, 1.0, 'X1')

../_images/8cc038c5e3c25800306cf497ea4c98676f19bea4520609bea0e78a206ef23f14.png

gdf.plot("y").set_title("y")

Text(0.5, 1.0, 'y')

../_images/43e06e491b6f369579a6622c81b200d4cacefdb1665bf9ff7f696df284b621e3.png

1. Create a Spatial Weights Matrix¶

Spatial econometric models require a spatial weights matrix \(W\) that encodes the spatial relationships between observations. Here we use Rook contiguity (regions that share a boundary or vertex are neighbors).

G = libpysal.graph.Graph.build_contiguity(gdf, rook=False).transform("r")

2. Fit a Baseline OLS Model¶

We start with a non-spatial OLS regression to establish a baseline.

\[ y = X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I) \]

# remove the intercept since we have one (-1)
form = "y ~ -1 + X_0 + X_1 + X_2"

from bayespecon import OLS

ols = OLS(formula=form, W=G, data=gdf)
ols.fit(draws=1000, tune=500, chains=2, random_seed=42)
ols.summary()

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, sigma2]
Sampling 2 chains for 500 tune and 1_000 draw iterations (1_000 + 2_000 draws total) took 6 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics

/home/runner/micromamba/envs/test/lib/python3.14/site-packages/rich/live.py:260: UserWarning: install "ipywidgets" 
for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
X_0	1.993	0.081	1.829	2.132	0.002	0.002	2247.0	1478.0	1.01
X_1	0.497	0.088	0.324	0.650	0.002	0.002	1890.0	1445.0	1.00
X_2	2.832	0.079	2.683	2.975	0.002	0.002	2452.0	1633.0	1.00
sigma2	2.651	0.188	2.296	2.996	0.004	0.004	1953.0	1616.0	1.00
sigma	1.627	0.058	1.525	1.740	0.001	0.001	1953.0	1616.0	1.00

Notice all the estimated beta parameters are biased upward.

Use `arviz` to inspect fit¶

the fit method attaches an inference_data object to each model that can be used directly with arviz functions

import arviz as az

az.plot_forest(ols.inference_data)

array([<Axes: title={'center': '94.0% HDI'}>], dtype=object)

../_images/ab45a9949c80e63302838fcbe1da234e2342fad2077954a0f545b64d901e1a01.png

az.plot_trace(ols.inference_data)

array([[<Axes: title={'center': 'beta'}>,
        <Axes: title={'center': 'beta'}>],
       [<Axes: title={'center': 'sigma2'}>,
        <Axes: title={'center': 'sigma2'}>],
       [<Axes: title={'center': 'sigma'}>,
        <Axes: title={'center': 'sigma'}>]], dtype=object)

../_images/c68f53f22d2634069b1339a81ca3387115ed6d8f789c181e1e13474baf5e4cbb.png

3. Run Bayesian LM Diagnostics¶

The spatial_diagnostics() method runs a battery of Bayesian LM tests (Doğan, Taşpınar & Bera 2021) that check whether spatial dependence is present in the residuals.

LM-Lag: Tests \(H_0: \rho = 0\) (no spatial lag on \(y\))
LM-Error: Tests \(H_0: \lambda = 0\) (no spatial error correlation)
LM-SDM-Joint: Joint test for \(\rho = 0\) and \(\gamma = 0\)
Robust-LM-Lag / Robust-LM-Error: Neyman-orthogonal robust versions

diag = ols.spatial_diagnostics()
diag

	statistic	median	df	p_value	ci_lower	ci_upper
test
LM-Lag	365.127729	363.730347	1	0.000000	282.437440	454.417382
LM-Error	216.777765	215.063816	1	0.000000	164.326858	274.453851
LM-SDM-Joint	385.532946	379.712057	3	0.000000	282.639936	519.655667
LM-SLX-Error-Joint	384.560787	383.399958	3	0.000000	325.160854	450.706092
Robust-LM-Lag	169.827039	168.504134	1	0.000000	127.256387	220.933707
Robust-LM-Error	16.226200	16.099803	1	0.000056	12.158768	21.109210

4. Model Selection Decision Tree¶

The spatial_diagnostics_decision() method walks a Koley & Bera decision tree using the Bayesian p-values above and recommends the next model to fit.

decision = ols.spatial_diagnostics_decision(alpha=0.05, format="ascii")
print(decision)

LM-Lag *  (p=0.0000, alpha=0.05)
├── <sig> LM-Error *  (p=0.0000, alpha=0.05)
│   ├── <sig> Robust-LM-Lag *  (p=0.0000, alpha=0.05)
│   │   ├── <sig> Robust-LM-Error *  (p=0.0001, alpha=0.05)
│   │   │   ├── <sig> Robust-LM-Lag p <= Robust-LM-Error p *
│   │   │   │   ├── [SAR] * ← SELECTED
│   │   │   │   └── [SEM]
│   │   │   └── [SAR]
│   │   └── <not sig> Robust-LM-Error
│   │       ├── [SEM]
│   │       └── <not sig> LM-Lag p <= LM-Error p
│   │           ├── [SAR]
│   │           └── [SEM]
│   └── [SAR]
└── <not sig> LM-Error
    ├── [SEM]
    └── [OLS]

ols.spatial_diagnostics_decision(alpha=0.05)

../_images/1b36d8a642ec3999a5a8bea5fdc31fd2b781fef5d5a9edce07a39210e555f04a.svg

5. Fit the Recommended SAR Model¶

If the diagnostics suggest spatial dependence, we fit a Spatial Autoregressive (SAR) model:

\[ y = \rho W y + X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I) \]

The likelihood includes the Jacobian \(\log|I - \rho W|\) so that posterior inference on \(\rho\) is exact.

from bayespecon import SAR

sar = SAR(formula=form, W=G, data=gdf)
sar.fit(draws=2000, tune=100, chains=4, random_seed=42)
sar.summary()

/home/runner/micromamba/envs/test/lib/python3.14/site-packages/rich/live.py:260: UserWarning: install "ipywidgets" 
for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

Sampling 4 chains for 100 tune and 2000 draw iterations, 4 x 2,100 draws total took 5s (1758 draws/s)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
rho	0.608	0.028	0.556	0.660	0.000	0.000	7999.0	5938.0	1.0
sigma	1.095	0.039	1.021	1.166	0.000	0.000	7699.0	7595.0	1.0
sigma2	1.200	0.085	1.042	1.359	0.001	0.001	7699.0	7595.0	1.0
X_0	0.969	0.072	0.832	1.101	0.001	0.001	7922.0	7648.0	1.0
X_1	0.402	0.057	0.298	0.512	0.001	0.000	7957.0	7805.0	1.0
X_2	2.549	0.056	2.447	2.656	0.001	0.000	8017.0	7914.0	1.0

az.plot_trace(sar.inference_data)

array([[<Axes: title={'center': 'rho'}>, <Axes: title={'center': 'rho'}>],
       [<Axes: title={'center': 'sigma'}>,
        <Axes: title={'center': 'sigma'}>],
       [<Axes: title={'center': 'sigma2'}>,
        <Axes: title={'center': 'sigma2'}>],
       [<Axes: title={'center': 'beta'}>,
        <Axes: title={'center': 'beta'}>]], dtype=object)

../_images/0d74a3d6d1232840dd5a4b6e4043ead3f187ebbfd2c51f0248b4882af6348029.png

6. Compute Spatial Effects¶

For the SAR model, a change in \(X\) propagates through the spatial multiplier \((I - \rho W)^{-1}\). The effects decompose into:

Direct: Effect of \(X_i\) on \(y_i\) (own-region)
Indirect: Effect of \(X_j\) on \(y_i\) for \(j \neq i\) (spillover)
Total: Direct + Indirect

effects = sar.spatial_effects()
effects

	direct	direct_ci_lower	direct_ci_upper	direct_pvalue	indirect	indirect_ci_lower	indirect_ci_upper	indirect_pvalue	total	total_ci_lower	total_ci_upper	total_pvalue
variable
X_1	0.434117	0.313930	0.556789	0.0	0.597965	0.403382	0.838954	0.0	1.032082	0.725351	1.378414	0.0
X_2	2.751963	2.624186	2.881137	0.0	3.790026	2.979504	4.741857	0.0	6.541989	5.653011	7.589003	0.0

Getting Started with bayespecon¶