bayespecon Replication of panelg_manual and sdemo_programs¶

This notebook reproduces representative examples from:

reference/panelg_manual.pdf (panel SDEM synthetic example, Chapter 2), and
reference/sdemo_programs (cross-sectional SDM and SDEM synthetic examples, Chapter 7).

It also performs explicit fidelity checks against data-generating-process (DGP) truth where available.

Scope and Faithfulness Criteria¶

Faithfulness in this notebook means:

Matching the same DGP structure and parameter values used in the reference examples.
Matching panel/cross-sectional structure and fixed-effects settings.
Checking that posterior means recover true parameters/effects within reasonable Monte Carlo tolerance.

Note: sdemo_programs/chapter4 convex-combination estimators (with unknown gamma weights over multiple W matrices) are not currently implemented in bayespecon. This notebook uses examples that are exactly representable by current bayespecon classes.

import numpy as np
import pandas as pd
from libpysal.graph import Graph
from scipy.sparse import csr_matrix

from bayespecon import SDEM, SDM, SDEMPanelFE, dgp

def make_knn_w(xcoord: np.ndarray, ycoord: np.ndarray, k: int) -> np.ndarray:
    """Build row-standardized k-NN weights (no self-neighbors)."""
    coords = np.column_stack([xcoord, ycoord])
    n = coords.shape[0]
    d = np.sqrt(((coords[:, None, :] - coords[None, :, :]) ** 2).sum(axis=2))
    np.fill_diagonal(d, np.inf)
    nn = np.argpartition(d, kth=k - 1, axis=1)[:, :k]

    W = np.zeros((n, n), dtype=float)
    for i in range(n):
        W[i, nn[i]] = 1.0

    rs = W.sum(axis=1, keepdims=True)
    rs[rs == 0.0] = 1.0
    return W / rs


def to_graph(W: np.ndarray) -> Graph:
    return Graph.from_sparse(csr_matrix(W)).transform("r")


def summarize_recovery(df: pd.DataFrame, abs_tol: float = 0.25) -> pd.DataFrame:
    out = df.copy()
    out["abs_error"] = (out["estimate"] - out["truth"]).abs()
    out["relative_error_pct"] = (
        100.0 * out["abs_error"] / np.maximum(np.abs(out["truth"]), 1e-12)
    )
    out["within_tol"] = out["abs_error"] <= abs_tol
    return out

Example A: panelg_manual Chapter 2 (`sdem_panel_gd`)¶

Reference DGP (manual pages around the SDEM section):

n = 100, t = 20, k = 2,
spatial error coefficient lambda = 0.7,
coefficients on X: +1, coefficients on WX: -1,
both region and time fixed effects (model = 3).

rng = np.random.default_rng(10203040)
n, t, k = 100, 20, 2
lam_true = 0.7
beta_true = np.ones(k)
theta_true = -np.ones(k)
sige = 0.1

x = rng.standard_normal((n * t, k))
Wn = make_knn_w(rng.random(n), rng.random(n), k=5)
Wbig = np.kron(np.eye(t), Wn)

SFE = np.kron(np.ones(t), np.arange(1, n + 1) / n)
TFE = np.kron(np.arange(1, t + 1) / t, np.ones(n))
u = np.linalg.solve(
    np.eye(n * t) - lam_true * Wbig, rng.standard_normal(n * t) * np.sqrt(sige)
)
y = x @ beta_true + (Wbig @ x) @ theta_true + SFE + TFE + u

m_panel = SDEMPanelFE(
    y=y,
    X=pd.DataFrame(x, columns=["x1", "x2"]),
    W=to_graph(Wn),
    N=n,
    T=t,
    model=3,
    priors={"lam_lower": -0.95, "lam_upper": 0.95},
)
idata_panel = m_panel.fit(
    draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=11
)
display(m_panel.summary(round_to=4))

Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [lam, beta, sigma]
Sampling 1 chain for 160 tune and 160 draw iterations (160 + 160 draws total) took 1 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
x1	1.0012	0.0083	0.9887	1.0172	0.0007	0.0008	154.0511	135.6721	NaN
x2	1.0128	0.0089	0.9969	1.0293	0.0008	0.0006	134.4166	138.4639	NaN
W*x1	-0.9934	0.0255	-1.0448	-0.9528	0.0023	0.0018	119.5630	70.4080	NaN
W*x2	-0.9889	0.0260	-1.0323	-0.9418	0.0027	0.0017	99.8179	91.9034	NaN
lam	0.6934	0.0170	0.6592	0.7219	0.0012	0.0017	198.6335	110.0793	NaN
sigma	0.3089	0.0045	0.3015	0.3163	0.0004	0.0003	107.5681	89.8917	NaN

beta_hat = idata_panel.posterior["beta"].mean(("chain", "draw")).to_numpy()
lam_hat = float(idata_panel.posterior["lam"].mean(("chain", "draw")).to_numpy())
faith_panel = pd.DataFrame(
    {
        "term": ["lam", "x1", "x2", "W*x1", "W*x2"],
        "truth": [lam_true, 1.0, 1.0, -1.0, -1.0],
        "estimate": [lam_hat, beta_hat[0], beta_hat[1], beta_hat[2], beta_hat[3]],
    }
)
faith_panel = summarize_recovery(faith_panel, abs_tol=0.30)
display(faith_panel)
print("Panel SDEM pass rate:", faith_panel["within_tol"].mean())

	term	truth	estimate	abs_error	relative_error_pct	within_tol
0	lam	0.7	0.693387	0.006613	0.944759	True
1	x1	1.0	1.001211	0.001211	0.121115	True
2	x2	1.0	1.012786	0.012786	1.278594	True
3	W*x1	-1.0	-0.993426	0.006574	0.657442	True
4	W*x2	-1.0	-0.988892	0.011108	1.110827	True

Panel SDEM pass rate: 1.0

Example B: sdemo_programs Chapter 7 (`sdm_cross_section_gd`)¶

Reference DGP:

n = 1000, t = 1, rho = 0.5, beta = [1, 1], theta = [-0.5, -0.5],
intercept of 2,
W from k-NN coordinates with 6 neighbors.

rng = np.random.default_rng(30203040)
n, k = 1000, 2
rho_true = 0.5
intercept_true = 2.0
beta_true = np.ones(k)
theta_true = -0.5 * np.ones(k)

Wn = make_knn_w(rng.random(n), rng.random(n), k=6)
sdm_data = dgp.simulate_sdm(
    W=Wn,
    rho=rho_true,
    beta1=np.array([intercept_true, *beta_true]),
    beta2=theta_true,
    sigma=np.sqrt(0.5),
    rng=rng,
)
y = sdm_data["y"]
X_arr = sdm_data["X"]

X = pd.DataFrame({"Intercept": X_arr[:, 0], "x1": X_arr[:, 1], "x2": X_arr[:, 2]})
m_sdm = SDM(
    y=y, X=X, W=sdm_data["W_graph"], priors={"rho_lower": -0.95, "rho_upper": 0.95}
)
idata_sdm = m_sdm.fit(
    draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=22
)
display(m_sdm.summary(round_to=4))

Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [rho, beta, sigma]
Sampling 1 chain for 160 tune and 160 draw iterations (160 + 160 draws total) took 1 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
Intercept	2.0891	0.1537	1.8438	2.3829	0.0161	0.0106	91.5101	114.7430	NaN
x1	0.9873	0.0220	0.9499	1.0279	0.0024	0.0012	91.7557	119.4593	NaN
x2	0.9883	0.0220	0.9509	1.0293	0.0019	0.0021	141.6787	80.2275	NaN
W*x1	-0.5138	0.0699	-0.6347	-0.3842	0.0065	0.0042	121.7593	141.4349	NaN
W*x2	-0.4573	0.0633	-0.5809	-0.3525	0.0051	0.0040	159.1185	95.7920	NaN
rho	0.4842	0.0376	0.4061	0.5417	0.0038	0.0026	98.8456	114.3114	NaN
sigma	0.7015	0.0156	0.6727	0.7297	0.0013	0.0018	139.2643	107.3855	NaN

post = idata_sdm.posterior
rho_hat = float(post["rho"].mean(("chain", "draw")).to_numpy())
beta_hat = post["beta"].mean(("chain", "draw")).to_numpy()

faith_sdm = pd.DataFrame(
    {
        "term": ["rho", "Intercept", "x1", "x2", "W*x1", "W*x2"],
        "truth": [
            rho_true,
            intercept_true,
            beta_true[0],
            beta_true[1],
            theta_true[0],
            theta_true[1],
        ],
        "estimate": [
            rho_hat,
            beta_hat[0],
            beta_hat[1],
            beta_hat[2],
            beta_hat[3],
            beta_hat[4],
        ],
    }
)

faith_sdm = summarize_recovery(faith_sdm, abs_tol=0.30)
display(faith_sdm)
print("Cross-sectional SDM fidelity pass rate:", faith_sdm["within_tol"].mean())

	term	truth	estimate	abs_error	relative_error_pct	within_tol
0	rho	0.5	0.484231	0.015769	3.153844	True
1	Intercept	2.0	2.089086	0.089086	4.454282	True
2	x1	1.0	0.987266	0.012734	1.273439	True
3	x2	1.0	0.988340	0.011660	1.165961	True
4	W*x1	-0.5	-0.513848	0.013848	2.769551	True
5	W*x2	-0.5	-0.457292	0.042708	8.541659	True

Cross-sectional SDM fidelity pass rate: 1.0

Example C: sdemo_programs Chapter 7 (`sdem_cross_section_gd`)¶

Reference DGP:

n = 3000, t = 1, rho = 0.8,
beta = [1,1,1,1], theta = 0.5 * beta,
intercept of 1,
SDEM disturbance process with k-NN(5) weights.

rng = np.random.default_rng(221010)
n, k = 3000, 4
lam_true = 0.8
intercept_true = 1.0
beta_true = np.ones(k)
theta_true = 0.5 * np.ones(k)

Wn = make_knn_w(rng.standard_normal(n), rng.standard_normal(n), k=5)
sdem_data = dgp.simulate_sdem(
    W=Wn,
    lam=lam_true,
    beta1=np.array([intercept_true, *beta_true]),
    beta2=theta_true,
    sigma=1.0,
    rng=rng,
)
y = sdem_data["y"]
X_arr = sdem_data["X"]

X = pd.DataFrame({"Intercept": X_arr[:, 0]})
for j in range(k):
    X[f"x{j + 1}"] = X_arr[:, j + 1]

m_sdem = SDEM(
    y=y, X=X, W=sdem_data["W_graph"], priors={"lam_lower": -0.95, "lam_upper": 0.95}
)
idata_sdem = m_sdem.fit(
    draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=33
)
display(m_sdem.summary(round_to=4))

Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [lam, beta, sigma]
Sampling 1 chain for 160 tune and 160 draw iterations (160 + 160 draws total) took 1 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
Intercept	0.8871	0.1102	0.6486	1.0780	0.0075	0.0097	209.7766	138.6741	NaN
x1	1.0496	0.0215	1.0072	1.0867	0.0019	0.0016	123.0915	118.5283	NaN
x2	0.9716	0.0221	0.9285	1.0128	0.0027	0.0017	69.5702	95.7920	NaN
x3	1.0209	0.0230	0.9817	1.0617	0.0020	0.0017	138.1239	107.0723	NaN
x4	1.0249	0.0237	0.9807	1.0639	0.0021	0.0014	124.1255	108.4955	NaN
W*x1	0.5509	0.0665	0.4047	0.6638	0.0058	0.0040	135.5608	171.4755	NaN
W*x2	0.4709	0.0628	0.3670	0.5983	0.0055	0.0048	127.3604	113.6371	NaN
W*x3	0.6234	0.0592	0.5267	0.7285	0.0071	0.0053	77.7412	86.9388	NaN
W*x4	0.6427	0.0735	0.4883	0.7435	0.0076	0.0060	98.7191	79.0962	NaN
lam	0.8153	0.0086	0.7976	0.8290	0.0007	0.0007	165.2826	177.1728	NaN
sigma	1.0016	0.0147	0.9727	1.0262	0.0011	0.0016	179.7078	105.2508	NaN

post = idata_sdem.posterior
lam_hat = float(post["lam"].mean(("chain", "draw")).to_numpy())
beta_hat = post["beta"].mean(("chain", "draw")).to_numpy()

terms = (
    ["lam", "Intercept"]
    + [f"x{i + 1}" for i in range(k)]
    + [f"W*x{i + 1}" for i in range(k)]
)
truth_vals = [lam_true, intercept_true] + beta_true.tolist() + theta_true.tolist()
est_vals = (
    [lam_hat, beta_hat[0]]
    + beta_hat[1 : 1 + k].tolist()
    + beta_hat[1 + k : 1 + 2 * k].tolist()
)

faith_sdem = pd.DataFrame({"term": terms, "truth": truth_vals, "estimate": est_vals})
faith_sdem = summarize_recovery(faith_sdem, abs_tol=0.30)
display(faith_sdem)
print("Cross-sectional SDEM fidelity pass rate:", faith_sdem["within_tol"].mean())

	term	truth	estimate	abs_error	relative_error_pct	within_tol
0	lam	0.8	0.815326	0.015326	1.915733	True
1	Intercept	1.0	0.887085	0.112915	11.291500	True
2	x1	1.0	1.049587	0.049587	4.958746	True
3	x2	1.0	0.971579	0.028421	2.842091	True
4	x3	1.0	1.020941	0.020941	2.094109	True
5	x4	1.0	1.024920	0.024920	2.491972	True
6	W*x1	0.5	0.550856	0.050856	10.171223	True
7	W*x2	0.5	0.470862	0.029138	5.827552	True
8	W*x3	0.5	0.623350	0.123350	24.670087	True
9	W*x4	0.5	0.642651	0.142651	28.530159	True

Cross-sectional SDEM fidelity pass rate: 1.0

Overall Fidelity Verdict¶

This notebook demonstrates that bayespecon is faithful to the selected panel/cross-sectional examples from panelg_manual and sdemo_programs that are representable by the implemented model classes (SAR/SDM/SEM/SDEM/SLX with one W matrix and FE modes).

Caveat:

Convex-combination examples that estimate unknown gamma weights over multiple W matrices (for example *_conv_panel_g) are not currently implemented in bayespecon. Those are outside current fidelity scope.

Example D: semip-style Spatial Probit with Spatial Regional Effects¶

This mirrors the core legacy semip_g structure:

binary outcome from latent threshold,
region-specific effects,
spatial dependence in region effects a = rho * W * a + u.

Unlike legacy semip_g, this demonstration uses homoskedastic observation-level probit variance (no v_i/r hierarchy).

from bayespecon import SpatialProbit

rng = np.random.default_rng(707)

# semip-like setup: m regions with mobs observations each
m = 48
mobs = np.full(m, 60, dtype=int)
n = int(mobs.sum())
k = 3

# Region-level kNN weights and spatial random effects
Wm = make_knn_w(rng.random(m), rng.random(m), k=6)
rho_true = 0.45
sigma_a_true = 1.2

u = rng.standard_normal(m) * sigma_a_true
a = np.linalg.solve(np.eye(m) - rho_true * Wm, u)

region_ids = np.repeat(np.arange(m), mobs)
X = rng.standard_normal((n, k))
X = np.column_stack([np.ones(n), X])
feature_names = ["Intercept", "x1", "x2", "x3"]

beta_true = np.array([0.4, 0.8, -0.6, 0.5])
eta = X @ beta_true + a[region_ids]
y = (eta + rng.standard_normal(n) > 0.0).astype(float)

sp = SpatialProbit(
    y=y,
    X=pd.DataFrame(X, columns=feature_names),
    W=to_graph(Wm),
    region_ids=region_ids,
    priors={"rho_lower": -0.95, "rho_upper": 0.95, "beta_sigma": 5.0},
)
idata_sp = sp.fit(
    draws=220,
    tune=220,
    chains=1,
    target_accept=0.92,
    progressbar=False,
    random_seed=707,
)

display(sp.summary(var_names=["beta", "rho", "sigma_a"], round_to=4))

Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [rho, beta, sigma_a, a_raw]
Sampling 1 chain for 220 tune and 220 draw iterations (220 + 220 draws total) took 5 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 220), minimum_shape: (chains=2, draws=4)

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
Intercept	0.7408	0.2723	0.2648	1.2926	0.0509	0.0408	27.3481	49.0617	NaN
x1	0.8691	0.0413	0.7955	0.9383	0.0034	0.0022	146.7543	173.9132	NaN
x2	-0.6236	0.0375	-0.6884	-0.5536	0.0031	0.0027	150.5389	120.0567	NaN
x3	0.5346	0.0405	0.4628	0.6104	0.0037	0.0026	119.6306	101.5853	NaN
rho	0.3338	0.1890	-0.0338	0.6481	0.0573	0.0174	10.1179	34.2140	NaN
sigma_a	1.3545	0.1437	1.1223	1.6684	0.0256	0.0137	30.8914	56.8495	NaN

beta_hat = idata_sp.posterior["beta"].mean(("chain", "draw")).to_numpy()
rho_hat = float(idata_sp.posterior["rho"].mean(("chain", "draw")).to_numpy())

a_mean_hat = sp.random_effects_mean().to_numpy()
a_rmse = float(np.sqrt(np.mean((a_mean_hat - a) ** 2)))

faith_sp = pd.DataFrame(
    {
        "term": ["rho"] + feature_names,
        "truth": [rho_true] + beta_true.tolist(),
        "estimate": [rho_hat] + beta_hat.tolist(),
    }
)
faith_sp = summarize_recovery(faith_sp, abs_tol=0.30)

print(f"Regional-effects RMSE: {a_rmse:.4f}")
display(faith_sp)
print("SpatialProbit semip-style pass rate:", faith_sp["within_tol"].mean())

Regional-effects RMSE: 0.4248
SpatialProbit semip-style pass rate: 0.8

	term	truth	estimate	abs_error	relative_error_pct	within_tol
0	rho	0.45	0.333752	0.116248	25.832820	True
1	Intercept	0.40	0.740844	0.340844	85.210929	False
2	x1	0.80	0.869112	0.069112	8.638974	True
3	x2	-0.60	-0.623649	0.023649	3.941448	True
4	x3	0.50	0.534598	0.034598	6.919527	True