bayespecon Replication of panelg_manual and sdemo_programs¶
This notebook reproduces representative examples from:
reference/panelg_manual.pdf(panel SDEM synthetic example, Chapter 2), andreference/sdemo_programs(cross-sectional SDM and SDEM synthetic examples, Chapter 7).
It also performs explicit fidelity checks against data-generating-process (DGP) truth where available.
Scope and Faithfulness Criteria¶
Faithfulness in this notebook means:
Matching the same DGP structure and parameter values used in the reference examples.
Matching panel/cross-sectional structure and fixed-effects settings.
Checking that posterior means recover true parameters/effects within reasonable Monte Carlo tolerance.
Note: sdemo_programs/chapter4 convex-combination estimators (with unknown gamma weights over multiple W matrices) are not currently implemented in bayespecon. This notebook uses examples that are exactly representable by current bayespecon classes.
import numpy as np
import pandas as pd
from libpysal.graph import Graph
from scipy.sparse import csr_matrix
from bayespecon import SDEM, SDM, SDEMPanelFE, dgp
def make_knn_w(xcoord: np.ndarray, ycoord: np.ndarray, k: int) -> np.ndarray:
"""Build row-standardized k-NN weights (no self-neighbors)."""
coords = np.column_stack([xcoord, ycoord])
n = coords.shape[0]
d = np.sqrt(((coords[:, None, :] - coords[None, :, :]) ** 2).sum(axis=2))
np.fill_diagonal(d, np.inf)
nn = np.argpartition(d, kth=k - 1, axis=1)[:, :k]
W = np.zeros((n, n), dtype=float)
for i in range(n):
W[i, nn[i]] = 1.0
rs = W.sum(axis=1, keepdims=True)
rs[rs == 0.0] = 1.0
return W / rs
def to_graph(W: np.ndarray) -> Graph:
return Graph.from_sparse(csr_matrix(W)).transform("r")
def summarize_recovery(df: pd.DataFrame, abs_tol: float = 0.25) -> pd.DataFrame:
out = df.copy()
out["abs_error"] = (out["estimate"] - out["truth"]).abs()
out["relative_error_pct"] = (
100.0 * out["abs_error"] / np.maximum(np.abs(out["truth"]), 1e-12)
)
out["within_tol"] = out["abs_error"] <= abs_tol
return out
Example A: panelg_manual Chapter 2 (sdem_panel_gd)¶
Reference DGP (manual pages around the SDEM section):
n = 100,t = 20,k = 2,spatial error coefficient
lambda = 0.7,coefficients on
X:+1, coefficients onWX:-1,both region and time fixed effects (
model = 3).
rng = np.random.default_rng(10203040)
n, t, k = 100, 20, 2
lam_true = 0.7
beta_true = np.ones(k)
theta_true = -np.ones(k)
sige = 0.1
x = rng.standard_normal((n * t, k))
Wn = make_knn_w(rng.random(n), rng.random(n), k=5)
Wbig = np.kron(np.eye(t), Wn)
SFE = np.kron(np.ones(t), np.arange(1, n + 1) / n)
TFE = np.kron(np.arange(1, t + 1) / t, np.ones(n))
u = np.linalg.solve(
np.eye(n * t) - lam_true * Wbig, rng.standard_normal(n * t) * np.sqrt(sige)
)
y = x @ beta_true + (Wbig @ x) @ theta_true + SFE + TFE + u
m_panel = SDEMPanelFE(
y=y,
X=pd.DataFrame(x, columns=["x1", "x2"]),
W=to_graph(Wn),
N=n,
T=t,
model=3,
priors={"lam_lower": -0.95, "lam_upper": 0.95},
)
idata_panel = m_panel.fit(
draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=11
)
display(m_panel.summary(round_to=4))
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [lam, beta, sigma]
Sampling 1 chain for 160 tune and 160 draw iterations (160 + 160 draws total) took 2 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| x1 | 1.0012 | 0.0083 | 0.9887 | 1.0172 | 0.0007 | 0.0008 | 154.0511 | 135.6721 | NaN |
| x2 | 1.0128 | 0.0089 | 0.9969 | 1.0293 | 0.0008 | 0.0006 | 134.4166 | 138.4639 | NaN |
| W*x1 | -0.9934 | 0.0255 | -1.0448 | -0.9528 | 0.0023 | 0.0018 | 119.5630 | 70.4080 | NaN |
| W*x2 | -0.9889 | 0.0260 | -1.0323 | -0.9418 | 0.0027 | 0.0017 | 99.8179 | 91.9034 | NaN |
| lam | 0.6934 | 0.0170 | 0.6592 | 0.7219 | 0.0012 | 0.0017 | 198.6335 | 110.0793 | NaN |
| sigma | 0.3089 | 0.0045 | 0.3015 | 0.3163 | 0.0004 | 0.0003 | 107.5681 | 89.8917 | NaN |
beta_hat = idata_panel.posterior["beta"].mean(("chain", "draw")).to_numpy()
lam_hat = float(idata_panel.posterior["lam"].mean(("chain", "draw")).to_numpy())
faith_panel = pd.DataFrame(
{
"term": ["lam", "x1", "x2", "W*x1", "W*x2"],
"truth": [lam_true, 1.0, 1.0, -1.0, -1.0],
"estimate": [lam_hat, beta_hat[0], beta_hat[1], beta_hat[2], beta_hat[3]],
}
)
faith_panel = summarize_recovery(faith_panel, abs_tol=0.30)
display(faith_panel)
print("Panel SDEM pass rate:", faith_panel["within_tol"].mean())
| term | truth | estimate | abs_error | relative_error_pct | within_tol | |
|---|---|---|---|---|---|---|
| 0 | lam | 0.7 | 0.693387 | 0.006613 | 0.944759 | True |
| 1 | x1 | 1.0 | 1.001211 | 0.001211 | 0.121115 | True |
| 2 | x2 | 1.0 | 1.012786 | 0.012786 | 1.278594 | True |
| 3 | W*x1 | -1.0 | -0.993426 | 0.006574 | 0.657442 | True |
| 4 | W*x2 | -1.0 | -0.988892 | 0.011108 | 1.110827 | True |
Panel SDEM pass rate: 1.0
Example B: sdemo_programs Chapter 7 (sdm_cross_section_gd)¶
Reference DGP:
n = 1000,t = 1,rho = 0.5,beta = [1, 1],theta = [-0.5, -0.5],intercept of
2,Wfrom k-NN coordinates with 6 neighbors.
rng = np.random.default_rng(30203040)
n, k = 1000, 2
rho_true = 0.5
intercept_true = 2.0
beta_true = np.ones(k)
theta_true = -0.5 * np.ones(k)
Wn = make_knn_w(rng.random(n), rng.random(n), k=6)
sdm_data = dgp.simulate_sdm(
W=Wn,
rho=rho_true,
beta1=np.array([intercept_true, *beta_true]),
beta2=theta_true,
sigma=np.sqrt(0.5),
rng=rng,
)
y = sdm_data["y"]
X_arr = sdm_data["X"]
X = pd.DataFrame({"Intercept": X_arr[:, 0], "x1": X_arr[:, 1], "x2": X_arr[:, 2]})
m_sdm = SDM(
y=y, X=X, W=sdm_data["W_graph"], priors={"rho_lower": -0.95, "rho_upper": 0.95}
)
idata_sdm = m_sdm.fit(
draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=22
)
display(m_sdm.summary(round_to=4))
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [rho, beta, sigma]
Sampling 1 chain for 160 tune and 160 draw iterations (160 + 160 draws total) took 1 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 2.0895 | 0.1604 | 1.8084 | 2.3819 | 0.0184 | 0.0111 | 76.1709 | 69.7704 | NaN |
| x1 | 0.9909 | 0.0223 | 0.9495 | 1.0274 | 0.0025 | 0.0012 | 78.6532 | 114.3114 | NaN |
| x2 | 0.9904 | 0.0218 | 0.9427 | 1.0228 | 0.0024 | 0.0014 | 83.3803 | 133.7529 | NaN |
| W*x1 | -0.5214 | 0.0725 | -0.6267 | -0.3649 | 0.0086 | 0.0052 | 71.4726 | 82.6740 | NaN |
| W*x2 | -0.4491 | 0.0642 | -0.5878 | -0.3461 | 0.0078 | 0.0043 | 73.7442 | 42.5167 | NaN |
| rho | 0.4836 | 0.0396 | 0.4218 | 0.5583 | 0.0045 | 0.0027 | 77.7034 | 80.1964 | NaN |
| sigma | 0.7014 | 0.0145 | 0.6739 | 0.7266 | 0.0012 | 0.0010 | 148.0342 | 81.7051 | NaN |
post = idata_sdm.posterior
rho_hat = float(post["rho"].mean(("chain", "draw")).to_numpy())
beta_hat = post["beta"].mean(("chain", "draw")).to_numpy()
faith_sdm = pd.DataFrame(
{
"term": ["rho", "Intercept", "x1", "x2", "W*x1", "W*x2"],
"truth": [
rho_true,
intercept_true,
beta_true[0],
beta_true[1],
theta_true[0],
theta_true[1],
],
"estimate": [
rho_hat,
beta_hat[0],
beta_hat[1],
beta_hat[2],
beta_hat[3],
beta_hat[4],
],
}
)
faith_sdm = summarize_recovery(faith_sdm, abs_tol=0.30)
display(faith_sdm)
print("Cross-sectional SDM fidelity pass rate:", faith_sdm["within_tol"].mean())
| term | truth | estimate | abs_error | relative_error_pct | within_tol | |
|---|---|---|---|---|---|---|
| 0 | rho | 0.5 | 0.483582 | 0.016418 | 3.283524 | True |
| 1 | Intercept | 2.0 | 2.089455 | 0.089455 | 4.472737 | True |
| 2 | x1 | 1.0 | 0.990925 | 0.009075 | 0.907520 | True |
| 3 | x2 | 1.0 | 0.990423 | 0.009577 | 0.957704 | True |
| 4 | W*x1 | -0.5 | -0.521430 | 0.021430 | 4.285993 | True |
| 5 | W*x2 | -0.5 | -0.449054 | 0.050946 | 10.189175 | True |
Cross-sectional SDM fidelity pass rate: 1.0
Example C: sdemo_programs Chapter 7 (sdem_cross_section_gd)¶
Reference DGP:
n = 3000,t = 1,rho = 0.8,beta = [1,1,1,1],theta = 0.5 * beta,intercept of
1,SDEM disturbance process with k-NN(5) weights.
rng = np.random.default_rng(221010)
n, k = 3000, 4
lam_true = 0.8
intercept_true = 1.0
beta_true = np.ones(k)
theta_true = 0.5 * np.ones(k)
Wn = make_knn_w(rng.standard_normal(n), rng.standard_normal(n), k=5)
sdem_data = dgp.simulate_sdem(
W=Wn,
lam=lam_true,
beta1=np.array([intercept_true, *beta_true]),
beta2=theta_true,
sigma=1.0,
rng=rng,
)
y = sdem_data["y"]
X_arr = sdem_data["X"]
X = pd.DataFrame({"Intercept": X_arr[:, 0]})
for j in range(k):
X[f"x{j + 1}"] = X_arr[:, j + 1]
m_sdem = SDEM(
y=y, X=X, W=sdem_data["W_graph"], priors={"lam_lower": -0.95, "lam_upper": 0.95}
)
idata_sdem = m_sdem.fit(
draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=33
)
display(m_sdem.summary(round_to=4))
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [lam, beta, sigma]
Sampling 1 chain for 160 tune and 160 draw iterations (160 + 160 draws total) took 2 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 0.8825 | 0.1032 | 0.7220 | 1.1005 | 0.0074 | 0.0091 | 202.0411 | 103.9191 | NaN |
| x1 | 1.0481 | 0.0238 | 1.0047 | 1.0911 | 0.0019 | 0.0015 | 158.7727 | 123.1338 | NaN |
| x2 | 0.9737 | 0.0231 | 0.9387 | 1.0248 | 0.0018 | 0.0015 | 153.2695 | 140.6729 | NaN |
| x3 | 1.0189 | 0.0191 | 0.9822 | 1.0483 | 0.0019 | 0.0010 | 107.1056 | 139.6531 | NaN |
| x4 | 1.0260 | 0.0218 | 0.9872 | 1.0608 | 0.0022 | 0.0016 | 103.7006 | 95.7920 | NaN |
| W*x1 | 0.5570 | 0.0613 | 0.4483 | 0.6681 | 0.0050 | 0.0039 | 188.2920 | 175.2638 | NaN |
| W*x2 | 0.4742 | 0.0700 | 0.3346 | 0.5738 | 0.0061 | 0.0049 | 135.1161 | 138.7687 | NaN |
| W*x3 | 0.6098 | 0.0646 | 0.5094 | 0.7484 | 0.0054 | 0.0042 | 142.0523 | 140.6729 | NaN |
| W*x4 | 0.6317 | 0.0659 | 0.5142 | 0.7428 | 0.0065 | 0.0043 | 107.4254 | 92.2475 | NaN |
| lam | 0.8147 | 0.0100 | 0.7944 | 0.8331 | 0.0007 | 0.0011 | 210.8133 | 95.6355 | NaN |
| sigma | 1.0029 | 0.0118 | 0.9809 | 1.0264 | 0.0011 | 0.0011 | 117.0226 | 114.3114 | NaN |
post = idata_sdem.posterior
lam_hat = float(post["lam"].mean(("chain", "draw")).to_numpy())
beta_hat = post["beta"].mean(("chain", "draw")).to_numpy()
terms = (
["lam", "Intercept"]
+ [f"x{i + 1}" for i in range(k)]
+ [f"W*x{i + 1}" for i in range(k)]
)
truth_vals = [lam_true, intercept_true] + beta_true.tolist() + theta_true.tolist()
est_vals = (
[lam_hat, beta_hat[0]]
+ beta_hat[1 : 1 + k].tolist()
+ beta_hat[1 + k : 1 + 2 * k].tolist()
)
faith_sdem = pd.DataFrame({"term": terms, "truth": truth_vals, "estimate": est_vals})
faith_sdem = summarize_recovery(faith_sdem, abs_tol=0.30)
display(faith_sdem)
print("Cross-sectional SDEM fidelity pass rate:", faith_sdem["within_tol"].mean())
| term | truth | estimate | abs_error | relative_error_pct | within_tol | |
|---|---|---|---|---|---|---|
| 0 | lam | 0.8 | 0.814673 | 0.014673 | 1.834086 | True |
| 1 | Intercept | 1.0 | 0.882506 | 0.117494 | 11.749364 | True |
| 2 | x1 | 1.0 | 1.048149 | 0.048149 | 4.814949 | True |
| 3 | x2 | 1.0 | 0.973665 | 0.026335 | 2.633489 | True |
| 4 | x3 | 1.0 | 1.018902 | 0.018902 | 1.890174 | True |
| 5 | x4 | 1.0 | 1.026049 | 0.026049 | 2.604886 | True |
| 6 | W*x1 | 0.5 | 0.557048 | 0.057048 | 11.409560 | True |
| 7 | W*x2 | 0.5 | 0.474210 | 0.025790 | 5.157997 | True |
| 8 | W*x3 | 0.5 | 0.609806 | 0.109806 | 21.961218 | True |
| 9 | W*x4 | 0.5 | 0.631671 | 0.131671 | 26.334224 | True |
Cross-sectional SDEM fidelity pass rate: 1.0
Overall Fidelity Verdict¶
This notebook demonstrates that bayespecon is faithful to the selected panel/cross-sectional examples from panelg_manual and sdemo_programs that are representable by the implemented model classes (SAR/SDM/SEM/SDEM/SLX with one W matrix and FE modes).
Caveat:
Convex-combination examples that estimate unknown gamma weights over multiple W matrices (for example
*_conv_panel_g) are not currently implemented inbayespecon. Those are outside current fidelity scope.
Example D: semip-style Spatial Probit with Spatial Regional Effects¶
This mirrors the core MATLAB semip_g structure:
binary outcome from latent threshold,
region-specific effects,
spatial dependence in region effects
a = rho * W * a + u.
Unlike MATLAB semip_g, this demonstration uses homoskedastic observation-level probit variance (no v_i/r hierarchy).
from bayespecon import SpatialProbit
rng = np.random.default_rng(707)
# semip-like setup: m regions with mobs observations each
m = 48
mobs = np.full(m, 60, dtype=int)
n = int(mobs.sum())
k = 3
# Region-level kNN weights and spatial random effects
Wm = make_knn_w(rng.random(m), rng.random(m), k=6)
rho_true = 0.45
sigma_a_true = 1.2
u = rng.standard_normal(m) * sigma_a_true
a = np.linalg.solve(np.eye(m) - rho_true * Wm, u)
region_ids = np.repeat(np.arange(m), mobs)
X = rng.standard_normal((n, k))
X = np.column_stack([np.ones(n), X])
feature_names = ["Intercept", "x1", "x2", "x3"]
beta_true = np.array([0.4, 0.8, -0.6, 0.5])
eta = X @ beta_true + a[region_ids]
y = (eta + rng.standard_normal(n) > 0.0).astype(float)
sp = SpatialProbit(
y=y,
X=pd.DataFrame(X, columns=feature_names),
W=to_graph(Wm),
region_ids=region_ids,
priors={"rho_lower": -0.95, "rho_upper": 0.95, "beta_sigma": 5.0},
)
idata_sp = sp.fit(
draws=220,
tune=220,
chains=1,
target_accept=0.92,
progressbar=False,
random_seed=707,
)
display(sp.summary(var_names=["beta", "rho", "sigma_a"], round_to=4))
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [rho, beta, sigma_a, a_raw]
Sampling 1 chain for 220 tune and 220 draw iterations (220 + 220 draws total) took 8 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 220), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| Intercept | 0.7778 | 0.3263 | 0.2251 | 1.3275 | 0.0754 | 0.0354 | 17.3242 | 49.5074 | NaN |
| x1 | 0.8654 | 0.0469 | 0.7834 | 0.9523 | 0.0035 | 0.0030 | 181.9248 | 147.6451 | NaN |
| x2 | -0.6205 | 0.0367 | -0.6835 | -0.5474 | 0.0028 | 0.0023 | 188.8376 | 146.2932 | NaN |
| x3 | 0.5343 | 0.0391 | 0.4690 | 0.6146 | 0.0022 | 0.0030 | 327.4212 | 174.7469 | NaN |
| rho | 0.3942 | 0.1670 | 0.0798 | 0.6851 | 0.0313 | 0.0094 | 28.5124 | 99.5993 | NaN |
| sigma_a | 1.3413 | 0.1587 | 1.0373 | 1.6105 | 0.0234 | 0.0221 | 50.1832 | 57.8635 | NaN |
beta_hat = idata_sp.posterior["beta"].mean(("chain", "draw")).to_numpy()
rho_hat = float(idata_sp.posterior["rho"].mean(("chain", "draw")).to_numpy())
a_mean_hat = sp.random_effects_mean().to_numpy()
a_rmse = float(np.sqrt(np.mean((a_mean_hat - a) ** 2)))
faith_sp = pd.DataFrame(
{
"term": ["rho"] + feature_names,
"truth": [rho_true] + beta_true.tolist(),
"estimate": [rho_hat] + beta_hat.tolist(),
}
)
faith_sp = summarize_recovery(faith_sp, abs_tol=0.30)
print(f"Regional-effects RMSE: {a_rmse:.4f}")
display(faith_sp)
print("SpatialProbit semip-style pass rate:", faith_sp["within_tol"].mean())
Regional-effects RMSE: 0.4578
SpatialProbit semip-style pass rate: 0.8
| term | truth | estimate | abs_error | relative_error_pct | within_tol | |
|---|---|---|---|---|---|---|
| 0 | rho | 0.45 | 0.394223 | 0.055777 | 12.394997 | True |
| 1 | Intercept | 0.40 | 0.777816 | 0.377816 | 94.454005 | False |
| 2 | x1 | 0.80 | 0.865380 | 0.065380 | 8.172557 | True |
| 3 | x2 | -0.60 | -0.620527 | 0.020527 | 3.421164 | True |
| 4 | x3 | 0.50 | 0.534281 | 0.034281 | 6.856177 | True |