Nonlinear Spatial Models

This guide demonstrates three nonlinear spatial models in bayespecon on synthetic data.

Spatial Probit

For binary outcomes \(y_i \in \{0,1\}\), latent utilities \(y^*\) follow a spatial lag:

\[y^* = \rho W y^* + X\beta + a + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, I)\]

where \(a\) are region-level random effects and \(y_i = \mathbf{1}[y^*_i > 0]\).

SAR Tobit

For censored outcomes, the spatial autoregressive Tobit specifies:

\[y^* = \rho W y^* + X\beta + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I)\]
\[y_i = \max(c,\, y^*_i)\]

where \(c\) is the censoring threshold (default 0).

SEM Tobit

The spatial error Tobit places spatial dependence in the disturbance:

\[y^* = X\beta + u, \quad u = \lambda W u + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I)\]
\[y_i = \max(c,\, y^*_i)\]
import numpy as np
from libpysal.graph import Graph

from bayespecon import SARTobit, SEMTobit, SpatialProbit, dgp

rng = np.random.default_rng(1234)
# SpatialProbit
m, n_per_region = 8, 20

# Create line-graph weights: each node connected to neighbors
focal, neighbor = [], []
for i in range(m):
    if i > 0:
        focal.append(i)
        neighbor.append(i - 1)
    if i < m - 1:
        focal.append(i)
        neighbor.append(i + 1)
Wm = Graph.from_arrays(
    np.array(focal), np.array(neighbor), weight=np.ones(len(focal))
).transform("r")

rho, beta, sigma_a = 0.35, np.array([0.3, 1.0]), 0.8

sp_data = dgp.simulate_spatial_probit(
    W=Wm,
    rho=rho,
    beta=beta,
    sigma_a=sigma_a,
    n_per_region=n_per_region,
    rng=rng,
)

sp = SpatialProbit(
    y=sp_data["y"],
    X=sp_data["X"],
    W=sp_data["W_graph"],
    region_ids=sp_data["region_ids"],
)
sp.fit(draws=300, tune=300, chains=2, random_seed=42, progressbar=False)
sp.summary(var_names=["rho", "beta", "sigma_a"])
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [rho, beta, sigma_a, a_raw]
Sampling 2 chains for 300 tune and 300 draw iterations (600 + 600 draws total) took 7 seconds.
There were 2 divergences after tuning. Increase `target_accept` or reparameterize.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho 0.165 0.337 -0.481 0.745 0.023 0.013 197.0 270.0 1.00
x0 0.436 1.043 -1.612 2.404 0.094 0.095 137.0 185.0 1.01
x1 1.024 0.200 0.660 1.391 0.010 0.010 380.0 325.0 1.00
sigma_a 1.841 0.589 0.857 2.934 0.043 0.039 202.0 199.0 1.00
# SARTobit and SEMTobit
n = 30

# Create line-graph weights: each node connected to neighbors
focal, neighbor = [], []
for i in range(n):
    if i > 0:
        focal.append(i)
        neighbor.append(i - 1)
    if i < n - 1:
        focal.append(i)
        neighbor.append(i + 1)
W = Graph.from_arrays(
    np.array(focal), np.array(neighbor), weight=np.ones(len(focal))
).transform("r")

beta = np.array([1.0, 1.5])
sigma, rho, lam = 0.8, 0.4, 0.35

sar_data = dgp.simulate_sar_tobit(
    W=W,
    rho=rho,
    beta=beta,
    sigma=sigma,
    censoring=0.0,
    rng=rng,
)
sar_tobit = SARTobit(
    y=sar_data["y"],
    X=sar_data["X"],
    W=sar_data["W_graph"],
    censoring=0.0,
)
sar_tobit.fit(draws=300, tune=300, chains=2, random_seed=42, progressbar=False)

sem_data = dgp.simulate_sem_tobit(
    W=W,
    lam=lam,
    beta=beta,
    sigma=sigma,
    censoring=0.0,
    rng=rng,
)
sem_tobit = SEMTobit(
    y=sem_data["y"],
    X=sem_data["X"],
    W=sem_data["W_graph"],
    censoring=0.0,
)
sem_tobit.fit(draws=300, tune=300, chains=2, random_seed=42, progressbar=False)

(
    sar_tobit.summary(var_names=["rho", "beta", "sigma"]),
    sem_tobit.summary(var_names=["lam", "beta", "sigma"]),
)
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [rho, beta, sigma, y_cens_gap]
Sampling 2 chains for 300 tune and 300 draw iterations (600 + 600 draws total) took 7 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [lam, beta, sigma, y_cens_gap]
Sampling 2 chains for 300 tune and 300 draw iterations (600 + 600 draws total) took 7 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
(        mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  \
 rho    0.456  0.081   0.316    0.623      0.005    0.004     296.0     325.0   
 x0     0.785  0.223   0.422    1.228      0.013    0.009     290.0     346.0   
 x1     1.643  0.172   1.295    1.963      0.010    0.009     317.0     322.0   
 sigma  0.812  0.144   0.597    1.101      0.007    0.007     459.0     357.0   
 
        r_hat  
 rho      1.0  
 x0       1.0  
 x1       1.0  
 sigma    1.0  ,
         mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  \
 lam    0.450  0.155   0.126    0.695      0.009    0.007     277.0     324.0   
 x0     1.450  0.215   1.022    1.853      0.012    0.013     350.0     259.0   
 x1     1.215  0.095   1.043    1.413      0.005    0.005     435.0     343.0   
 sigma  0.543  0.082   0.393    0.699      0.005    0.003     345.0     459.0   
 
        r_hat  
 lam     1.01  
 x0      1.01  
 x1      1.01  
 sigma   1.01  )