bayespecon Replication of panelg_manual and sdemo_programs

This notebook reproduces representative examples from:

  • reference/panelg_manual.pdf (panel SDEM synthetic example, Chapter 2), and

  • reference/sdemo_programs (cross-sectional SDM and SDEM synthetic examples, Chapter 7).

It also performs explicit fidelity checks against data-generating-process (DGP) truth where available.

Scope and Faithfulness Criteria

Faithfulness in this notebook means:

  1. Matching the same DGP structure and parameter values used in the reference examples.

  2. Matching panel/cross-sectional structure and fixed-effects settings.

  3. Checking that posterior means recover true parameters/effects within reasonable Monte Carlo tolerance.

Note: sdemo_programs/chapter4 convex-combination estimators (with unknown gamma weights over multiple W matrices) are not currently implemented in bayespecon. This notebook uses examples that are exactly representable by current bayespecon classes.

import numpy as np
import pandas as pd
from libpysal.graph import Graph
from scipy.sparse import csr_matrix

from bayespecon import SDEM, SDM, SDEMPanelFE, dgp
def make_knn_w(xcoord: np.ndarray, ycoord: np.ndarray, k: int) -> np.ndarray:
    """Build row-standardized k-NN weights (no self-neighbors)."""
    coords = np.column_stack([xcoord, ycoord])
    n = coords.shape[0]
    d = np.sqrt(((coords[:, None, :] - coords[None, :, :]) ** 2).sum(axis=2))
    np.fill_diagonal(d, np.inf)
    nn = np.argpartition(d, kth=k - 1, axis=1)[:, :k]

    W = np.zeros((n, n), dtype=float)
    for i in range(n):
        W[i, nn[i]] = 1.0

    rs = W.sum(axis=1, keepdims=True)
    rs[rs == 0.0] = 1.0
    return W / rs


def to_graph(W: np.ndarray) -> Graph:
    return Graph.from_sparse(csr_matrix(W)).transform("r")


def summarize_recovery(df: pd.DataFrame, abs_tol: float = 0.25) -> pd.DataFrame:
    out = df.copy()
    out["abs_error"] = (out["estimate"] - out["truth"]).abs()
    out["relative_error_pct"] = (
        100.0 * out["abs_error"] / np.maximum(np.abs(out["truth"]), 1e-12)
    )
    out["within_tol"] = out["abs_error"] <= abs_tol
    return out

Example A: panelg_manual Chapter 2 (sdem_panel_gd)

Reference DGP (manual pages around the SDEM section):

  • n = 100, t = 20, k = 2,

  • spatial error coefficient lambda = 0.7,

  • coefficients on X: +1, coefficients on WX: -1,

  • both region and time fixed effects (model = 3).

rng = np.random.default_rng(10203040)
n, t, k = 100, 20, 2
lam_true = 0.7
beta_true = np.ones(k)
theta_true = -np.ones(k)
sige = 0.1

x = rng.standard_normal((n * t, k))
Wn = make_knn_w(rng.random(n), rng.random(n), k=5)
Wbig = np.kron(np.eye(t), Wn)

SFE = np.kron(np.ones(t), np.arange(1, n + 1) / n)
TFE = np.kron(np.arange(1, t + 1) / t, np.ones(n))
u = np.linalg.solve(
    np.eye(n * t) - lam_true * Wbig, rng.standard_normal(n * t) * np.sqrt(sige)
)
y = x @ beta_true + (Wbig @ x) @ theta_true + SFE + TFE + u

m_panel = SDEMPanelFE(
    y=y,
    X=pd.DataFrame(x, columns=["x1", "x2"]),
    W=to_graph(Wn),
    N=n,
    T=t,
    model=3,
    priors={"lam_lower": -0.95, "lam_upper": 0.95},
)
idata_panel = m_panel.fit(
    draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=11
)
display(m_panel.summary(round_to=4))
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
lam 0.6965 0.0163 0.6636 0.7249 0.0014 0.0013 139.6514 52.4732 NaN
sigma 0.3126 0.0054 0.3036 0.3235 0.0005 0.0003 122.5958 178.6122 NaN
sigma2 0.0978 0.0034 0.0922 0.1046 0.0003 0.0002 122.5958 178.6122 NaN
x1 1.0014 0.0079 0.9893 1.0170 0.0006 0.0004 150.8078 130.6426 NaN
x2 1.0135 0.0083 0.9983 1.0284 0.0007 0.0005 128.0050 139.4668 NaN
W*x1 -0.9923 0.0230 -1.0368 -0.9558 0.0021 0.0012 117.5269 141.4349 NaN
W*x2 -0.9861 0.0255 -1.0436 -0.9494 0.0021 0.0016 152.2401 140.6729 NaN
beta_hat = idata_panel.posterior["beta"].mean(("chain", "draw")).to_numpy()
lam_hat = float(idata_panel.posterior["lam"].mean(("chain", "draw")).to_numpy())
faith_panel = pd.DataFrame(
    {
        "term": ["lam", "x1", "x2", "W*x1", "W*x2"],
        "truth": [lam_true, 1.0, 1.0, -1.0, -1.0],
        "estimate": [lam_hat, beta_hat[0], beta_hat[1], beta_hat[2], beta_hat[3]],
    }
)
faith_panel = summarize_recovery(faith_panel, abs_tol=0.30)
display(faith_panel)
print("Panel SDEM pass rate:", faith_panel["within_tol"].mean())
term truth estimate abs_error relative_error_pct within_tol
0 lam 0.7 0.696534 0.003466 0.495159 True
1 x1 1.0 1.001360 0.001360 0.135990 True
2 x2 1.0 1.013512 0.013512 1.351160 True
3 W*x1 -1.0 -0.992311 0.007689 0.768933 True
4 W*x2 -1.0 -0.986078 0.013922 1.392195 True
Panel SDEM pass rate: 1.0

Example B: sdemo_programs Chapter 7 (sdm_cross_section_gd)

Reference DGP:

  • n = 1000, t = 1, rho = 0.5, beta = [1, 1], theta = [-0.5, -0.5],

  • intercept of 2,

  • W from k-NN coordinates with 6 neighbors.

rng = np.random.default_rng(30203040)
n, k = 1000, 2
rho_true = 0.5
intercept_true = 2.0
beta_true = np.ones(k)
theta_true = -0.5 * np.ones(k)

Wn = make_knn_w(rng.random(n), rng.random(n), k=6)
sdm_data = dgp.simulate_sdm(
    W=Wn,
    rho=rho_true,
    beta1=np.array([intercept_true, *beta_true]),
    beta2=theta_true,
    sigma=np.sqrt(0.5),
    rng=rng,
)
y = sdm_data["y"]
X_arr = sdm_data["X"]

X = pd.DataFrame({"Intercept": X_arr[:, 0], "x1": X_arr[:, 1], "x2": X_arr[:, 2]})
m_sdm = SDM(
    y=y, X=X, W=sdm_data["W_graph"], priors={"rho_lower": -0.95, "rho_upper": 0.95}
)
idata_sdm = m_sdm.fit(
    draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=22
)
display(m_sdm.summary(round_to=4))
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho 0.4860 0.0381 0.4184 0.5580 0.0028 0.0033 184.0741 90.6062 NaN
sigma 0.7033 0.0167 0.6713 0.7344 0.0015 0.0008 123.2603 171.4755 NaN
sigma2 0.4949 0.0235 0.4506 0.5393 0.0021 0.0012 123.2603 171.4755 NaN
Intercept 2.0787 0.1559 1.7431 2.3297 0.0116 0.0132 184.5884 86.8357 NaN
x1 0.9866 0.0220 0.9401 1.0221 0.0017 0.0012 161.1197 110.0793 NaN
x2 0.9864 0.0208 0.9369 1.0168 0.0014 0.0011 216.5130 180.8050 NaN
W*x1 -0.5223 0.0685 -0.6207 -0.3670 0.0063 0.0034 117.4948 112.5358 NaN
W*x2 -0.4543 0.0713 -0.5644 -0.3096 0.0059 0.0045 148.7934 109.8018 NaN
post = idata_sdm.posterior
rho_hat = float(post["rho"].mean(("chain", "draw")).to_numpy())
beta_hat = post["beta"].mean(("chain", "draw")).to_numpy()

faith_sdm = pd.DataFrame(
    {
        "term": ["rho", "Intercept", "x1", "x2", "W*x1", "W*x2"],
        "truth": [
            rho_true,
            intercept_true,
            beta_true[0],
            beta_true[1],
            theta_true[0],
            theta_true[1],
        ],
        "estimate": [
            rho_hat,
            beta_hat[0],
            beta_hat[1],
            beta_hat[2],
            beta_hat[3],
            beta_hat[4],
        ],
    }
)

faith_sdm = summarize_recovery(faith_sdm, abs_tol=0.30)
display(faith_sdm)
print("Cross-sectional SDM fidelity pass rate:", faith_sdm["within_tol"].mean())
term truth estimate abs_error relative_error_pct within_tol
0 rho 0.5 0.485994 0.014006 2.801185 True
1 Intercept 2.0 2.078678 0.078678 3.933883 True
2 x1 1.0 0.986555 0.013445 1.344476 True
3 x2 1.0 0.986417 0.013583 1.358325 True
4 W*x1 -0.5 -0.522254 0.022254 4.450876 True
5 W*x2 -0.5 -0.454296 0.045704 9.140761 True
Cross-sectional SDM fidelity pass rate: 1.0

Example C: sdemo_programs Chapter 7 (sdem_cross_section_gd)

Reference DGP:

  • n = 3000, t = 1, rho = 0.8,

  • beta = [1,1,1,1], theta = 0.5 * beta,

  • intercept of 1,

  • SDEM disturbance process with k-NN(5) weights.

rng = np.random.default_rng(221010)
n, k = 3000, 4
lam_true = 0.8
intercept_true = 1.0
beta_true = np.ones(k)
theta_true = 0.5 * np.ones(k)

Wn = make_knn_w(rng.standard_normal(n), rng.standard_normal(n), k=5)
sdem_data = dgp.simulate_sdem(
    W=Wn,
    lam=lam_true,
    beta1=np.array([intercept_true, *beta_true]),
    beta2=theta_true,
    sigma=1.0,
    rng=rng,
)
y = sdem_data["y"]
X_arr = sdem_data["X"]

X = pd.DataFrame({"Intercept": X_arr[:, 0]})
for j in range(k):
    X[f"x{j + 1}"] = X_arr[:, j + 1]

m_sdem = SDEM(
    y=y, X=X, W=sdem_data["W_graph"], priors={"lam_lower": -0.95, "lam_upper": 0.95}
)
idata_sdem = m_sdem.fit(
    draws=160, tune=160, chains=1, target_accept=0.92, progressbar=False, random_seed=33
)
display(m_sdem.summary(round_to=4))
arviz - WARNING - Shape validation failed: input_shape: (1, 160), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
lam 0.8179 0.0106 0.8004 0.8379 0.0008 0.0008 165.0939 65.5074 NaN
sigma 1.0020 0.0129 0.9803 1.0266 0.0011 0.0007 150.9887 179.4750 NaN
sigma2 1.0042 0.0259 0.9610 1.0538 0.0021 0.0014 150.9887 179.4750 NaN
Intercept 0.8930 0.1107 0.7259 1.1071 0.0104 0.0049 111.0109 168.7975 NaN
x1 1.0483 0.0220 1.0152 1.0927 0.0017 0.0013 150.4263 181.5577 NaN
x2 0.9762 0.0233 0.9363 1.0211 0.0021 0.0015 128.5595 94.7075 NaN
x3 1.0193 0.0204 0.9827 1.0560 0.0029 0.0012 52.4614 128.1580 NaN
x4 1.0225 0.0231 0.9732 1.0576 0.0018 0.0012 155.2841 150.9224 NaN
W*x1 0.5486 0.0686 0.4459 0.6901 0.0050 0.0044 177.3294 137.6572 NaN
W*x2 0.4809 0.0682 0.3454 0.5958 0.0061 0.0036 125.7173 139.8088 NaN
W*x3 0.6128 0.0590 0.5061 0.7293 0.0053 0.0027 120.1543 169.1875 NaN
W*x4 0.6261 0.0652 0.5064 0.7401 0.0054 0.0037 144.4867 155.6905 NaN
post = idata_sdem.posterior
lam_hat = float(post["lam"].mean(("chain", "draw")).to_numpy())
beta_hat = post["beta"].mean(("chain", "draw")).to_numpy()

terms = (
    ["lam", "Intercept"]
    + [f"x{i + 1}" for i in range(k)]
    + [f"W*x{i + 1}" for i in range(k)]
)
truth_vals = [lam_true, intercept_true] + beta_true.tolist() + theta_true.tolist()
est_vals = (
    [lam_hat, beta_hat[0]]
    + beta_hat[1 : 1 + k].tolist()
    + beta_hat[1 + k : 1 + 2 * k].tolist()
)

faith_sdem = pd.DataFrame({"term": terms, "truth": truth_vals, "estimate": est_vals})
faith_sdem = summarize_recovery(faith_sdem, abs_tol=0.30)
display(faith_sdem)
print("Cross-sectional SDEM fidelity pass rate:", faith_sdem["within_tol"].mean())
term truth estimate abs_error relative_error_pct within_tol
0 lam 0.8 0.817919 0.017919 2.239875 True
1 Intercept 1.0 0.892982 0.107018 10.701761 True
2 x1 1.0 1.048271 0.048271 4.827142 True
3 x2 1.0 0.976164 0.023836 2.383640 True
4 x3 1.0 1.019320 0.019320 1.931968 True
5 x4 1.0 1.022505 0.022505 2.250484 True
6 W*x1 0.5 0.548591 0.048591 9.718199 True
7 W*x2 0.5 0.480902 0.019098 3.819581 True
8 W*x3 0.5 0.612824 0.112824 22.564853 True
9 W*x4 0.5 0.626077 0.126077 25.215367 True
Cross-sectional SDEM fidelity pass rate: 1.0

Overall Fidelity Verdict

This notebook demonstrates that bayespecon is faithful to the selected panel/cross-sectional examples from panelg_manual and sdemo_programs that are representable by the implemented model classes (SAR/SDM/SEM/SDEM/SLX with one W matrix and FE modes).

Caveat:

  • Convex-combination examples that estimate unknown gamma weights over multiple W matrices (for example *_conv_panel_g) are not currently implemented in bayespecon. Those are outside current fidelity scope.

Example D: semip-style Spatial Probit with Spatial Regional Effects

This mirrors the core legacy semip_g structure:

  • binary outcome from latent threshold,

  • region-specific effects,

  • spatial dependence in region effects a = rho * W * a + u.

Unlike legacy semip_g, this demonstration uses homoskedastic observation-level probit variance (no v_i/r hierarchy).

from bayespecon import SpatialProbit

rng = np.random.default_rng(707)

# semip-like setup: m regions with mobs observations each
m = 48
mobs = np.full(m, 60, dtype=int)
n = int(mobs.sum())
k = 3

# Region-level kNN weights and spatial random effects
Wm = make_knn_w(rng.random(m), rng.random(m), k=6)
rho_true = 0.45
sigma_a_true = 1.2

u = rng.standard_normal(m) * sigma_a_true
a = np.linalg.solve(np.eye(m) - rho_true * Wm, u)

region_ids = np.repeat(np.arange(m), mobs)
X = rng.standard_normal((n, k))
X = np.column_stack([np.ones(n), X])
feature_names = ["Intercept", "x1", "x2", "x3"]

beta_true = np.array([0.4, 0.8, -0.6, 0.5])
eta = X @ beta_true + a[region_ids]
y = (eta + rng.standard_normal(n) > 0.0).astype(float)

sp = SpatialProbit(
    y=y,
    X=pd.DataFrame(X, columns=feature_names),
    W=to_graph(Wm),
    region_ids=region_ids,
    priors={"rho_lower": -0.95, "rho_upper": 0.95, "beta_sigma": 5.0},
)
idata_sp = sp.fit(
    draws=220,
    tune=220,
    chains=1,
    target_accept=0.92,
    progressbar=False,
    random_seed=707,
)

display(sp.summary(var_names=["beta", "rho", "sigma_a"], round_to=4))
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [rho, beta, sigma_a, a_raw]
Sampling 1 chain for 220 tune and 220 draw iterations (220 + 220 draws total) took 4 seconds.
Only one chain was sampled, this makes it impossible to run some convergence checks
arviz - WARNING - Shape validation failed: input_shape: (1, 220), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
Intercept 0.7408 0.2723 0.2648 1.2926 0.0509 0.0408 27.3481 49.0617 NaN
x1 0.8691 0.0413 0.7955 0.9383 0.0034 0.0022 146.7543 173.9132 NaN
x2 -0.6236 0.0375 -0.6884 -0.5536 0.0031 0.0027 150.5389 120.0567 NaN
x3 0.5346 0.0405 0.4628 0.6104 0.0037 0.0026 119.6306 101.5853 NaN
rho 0.3338 0.1890 -0.0338 0.6481 0.0573 0.0174 10.1179 34.2140 NaN
sigma_a 1.3545 0.1437 1.1223 1.6684 0.0256 0.0137 30.8914 56.8495 NaN
beta_hat = idata_sp.posterior["beta"].mean(("chain", "draw")).to_numpy()
rho_hat = float(idata_sp.posterior["rho"].mean(("chain", "draw")).to_numpy())

a_mean_hat = sp.random_effects_mean().to_numpy()
a_rmse = float(np.sqrt(np.mean((a_mean_hat - a) ** 2)))

faith_sp = pd.DataFrame(
    {
        "term": ["rho"] + feature_names,
        "truth": [rho_true] + beta_true.tolist(),
        "estimate": [rho_hat] + beta_hat.tolist(),
    }
)
faith_sp = summarize_recovery(faith_sp, abs_tol=0.30)

print(f"Regional-effects RMSE: {a_rmse:.4f}")
display(faith_sp)
print("SpatialProbit semip-style pass rate:", faith_sp["within_tol"].mean())
Regional-effects RMSE: 0.4248
SpatialProbit semip-style pass rate: 0.8
term truth estimate abs_error relative_error_pct within_tol
0 rho 0.45 0.333752 0.116248 25.832820 True
1 Intercept 0.40 0.740844 0.340844 85.210929 False
2 x1 0.80 0.869112 0.069112 8.638974 True
3 x2 -0.60 -0.623649 0.023649 3.941448 True
4 x3 0.50 0.534598 0.034598 6.919527 True