Nonlinear Spatial Models

This guide demonstrates three nonlinear spatial models in bayespecon on synthetic data.

Spatial Probit

For binary outcomes \(y_i \in \{0,1\}\), latent utilities \(y^*\) follow a spatial lag:

\[y^* = \rho W y^* + X\beta + a + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, I)\]

where \(a\) are region-level random effects and \(y_i = \mathbf{1}[y^*_i > 0]\).

SAR Tobit

For censored outcomes, the spatial autoregressive Tobit specifies:

\[y^* = \rho W y^* + X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I)\]
\[y_i = \max(c,\, y^*_i)\]

where \(c\) is the censoring threshold (default 0).

SEM Tobit

The spatial error Tobit places spatial dependence in the disturbance:

\[y^* = X\beta + u, \quad u = \lambda W u + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I)\]
\[y_i = \max(c,\, y^*_i)\]
import numpy as np
from libpysal.graph import Graph

from bayespecon import SARTobit, SEMTobit, SpatialProbit, dgp

rng = np.random.default_rng(1234)
# SpatialProbit
m, n_per_region = 8, 20

# Create line-graph weights: each node connected to neighbors
focal, neighbor = [], []
for i in range(m):
    if i > 0:
        focal.append(i)
        neighbor.append(i - 1)
    if i < m - 1:
        focal.append(i)
        neighbor.append(i + 1)
Wm = Graph.from_arrays(
    np.array(focal), np.array(neighbor), weight=np.ones(len(focal))
).transform("r")

rho, beta, sigma_a = 0.35, np.array([0.3, 1.0]), 0.8

sp_data = dgp.simulate_spatial_probit(
    W=Wm,
    rho=rho,
    beta=beta,
    sigma_a=sigma_a,
    n_per_region=n_per_region,
    rng=rng,
)

sp = SpatialProbit(
    y=sp_data["y"],
    X=sp_data["X"],
    W=sp_data["W_graph"],
    region_ids=sp_data["region_ids"],
)
sp.fit(draws=300, tune=300, chains=2, random_seed=42, progressbar=False)
sp.summary(var_names=["rho", "beta", "sigma_a"])
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [rho, beta, sigma_a, a_raw]
Sampling 2 chains for 300 tune and 300 draw iterations (600 + 600 draws total) took 8 seconds.
There were 23 divergences after tuning. Increase `target_accept` or reparameterize.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho 0.204 0.367 -0.511 0.766 0.029 0.016 163.0 198.0 1.01
x0 0.196 1.441 -3.345 2.313 0.240 0.325 53.0 24.0 1.09
x1 1.010 0.187 0.670 1.369 0.012 0.007 235.0 393.0 1.00
sigma_a 2.000 0.795 0.812 3.681 0.175 0.131 32.0 30.0 1.07
# SARTobit and SEMTobit
n = 30

# Create line-graph weights: each node connected to neighbors
focal, neighbor = [], []
for i in range(n):
    if i > 0:
        focal.append(i)
        neighbor.append(i - 1)
    if i < n - 1:
        focal.append(i)
        neighbor.append(i + 1)
W = Graph.from_arrays(
    np.array(focal), np.array(neighbor), weight=np.ones(len(focal))
).transform("r")

beta = np.array([1.0, 1.5])
sigma, rho, lam = 0.8, 0.4, 0.35

sar_data = dgp.simulate_sar_tobit(
    W=W,
    rho=rho,
    beta=beta,
    sigma=sigma,
    censoring=0.0,
    rng=rng,
)
sar_tobit = SARTobit(
    y=sar_data["y"],
    X=sar_data["X"],
    W=sar_data["W_graph"],
    censoring=0.0,
)
sar_tobit.fit(draws=300, tune=300, chains=2, random_seed=42, progressbar=False)

sem_data = dgp.simulate_sem_tobit(
    W=W,
    lam=lam,
    beta=beta,
    sigma=sigma,
    censoring=0.0,
    rng=rng,
)
sem_tobit = SEMTobit(
    y=sem_data["y"],
    X=sem_data["X"],
    W=sem_data["W_graph"],
    censoring=0.0,
)
sem_tobit.fit(draws=300, tune=300, chains=2, random_seed=42, progressbar=False)

(
    sar_tobit.summary(var_names=["rho", "beta", "sigma"]),
    sem_tobit.summary(var_names=["lam", "beta", "sigma"]),
)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [rho, beta, sigma, y_cens_gap]
Sampling 2 chains for 300 tune and 300 draw iterations (600 + 600 draws total) took 6 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [lam, beta, sigma, y_cens_gap]
Sampling 2 chains for 300 tune and 300 draw iterations (600 + 600 draws total) took 6 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
(        mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  \
 rho    0.465  0.082   0.317    0.614      0.004    0.003     341.0     436.0   
 x0     0.750  0.222   0.348    1.156      0.012    0.008     353.0     313.0   
 x1     1.641  0.173   1.343    1.992      0.010    0.007     309.0     303.0   
 sigma  0.821  0.143   0.567    1.055      0.008    0.008     403.0     466.0   
 
        r_hat  
 rho     1.01  
 x0      1.01  
 x1      1.00  
 sigma   1.00  ,
         mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  \
 lam    0.439  0.144   0.170    0.702      0.007    0.006     462.0     422.0   
 x0     1.443  0.205   1.048    1.812      0.010    0.009     409.0     366.0   
 x1     1.212  0.091   1.055    1.411      0.005    0.004     375.0     337.0   
 sigma  0.552  0.087   0.419    0.733      0.004    0.003     454.0     442.0   
 
        r_hat  
 lam     1.01  
 x0      1.01  
 x1      1.01  
 sigma   1.00  )