bayespecon.SARNegBinLatent¶

class bayespecon.SARNegBinLatent(*args, **kwargs)[source]¶

Bayesian structural-form SAR-NB with Pólya–Gamma Gibbs sampler.

Parameters:¶

formula: Same interface as SARNegativeBinomial.
data: Same interface as SARNegativeBinomial.
y: Same interface as SARNegativeBinomial.
X: Same interface as SARNegativeBinomial.
W: Same interface as SARNegativeBinomial.
priors: Same interface as SARNegativeBinomial.
logdet_method: Same interface as SARNegativeBinomial.
robust : bool, default False: Not supported. Raises NotImplementedError if True.

Notes

The structural form parameterises the latent log-mean as eta = rho * W @ eta + X @ beta + nu with nu ~ N(0, sigma^2 I), and augments the NB likelihood with Pólya–Gamma auxiliary variables to obtain fully conjugate Gibbs updates for eta, beta, and sigma^2.

The sampler bypasses PyMC’s NUTS entirely. It produces an arviz.InferenceData object compatible with all downstream diagnostics (spatial_diagnostics(), spatial_effects(), summary()).

The fit() method does not accept nuts_sampler or target_accept kwargs — these are NUTS-specific and will raise TypeError if passed.

α (NB dispersion) mixing can be slower than ρ or β. Monitor ESS for α specifically and use longer runs if needed.

__init__(*args, **kwargs)[source]¶

Methods

`__init__`(args, *kwargs)
`fit`([draws, tune, chains, random_seed, ...])	Sample posterior via Pólya–Gamma block Gibbs.
`fitted_values`()	Return fitted values at posterior mean parameters.
`residuals`()	Return residuals on the observed scale.
`spatial_diagnostics`()	Run Bayesian LM specification tests and return a summary table.
`spatial_diagnostics_decision`([alpha, format])	Return a model-selection decision from Bayesian LM test results.
`spatial_effects`([return_posterior_samples])	Compute Bayesian inference for direct, indirect, and total impacts.
`summary`([var_names])	Return posterior summary table.

Attributes

`inference_data`	Return the ArviZ InferenceData from the most recent fit.
`pymc_model`	Return the PyMC model object built for the most recent fit.

fit(draws=2000, tune=1000, chains=4, random_seed=None, thin=1, return_eta=False, n_jobs=-1, progressbar=True, gibbs_method='auto', pg_n_terms=10, n_probes=5, lanczos_deg=15, mh_proposal_sd=0.05, use_mala=True, **kwargs)[source]¶

Sample posterior via Pólya–Gamma block Gibbs.

Parameters:¶

draws : int¶

Number of post-warmup draws per chain.

tune : int¶

Number of warmup (burn-in) draws per chain.

chains : int¶

Number of independent chains.

random_seed : int or None¶

Seed for reproducibility.

thin : int¶

Keep every thin-th draw. Default 1 (no thinning). Thinning is for memory management, not statistical efficiency.

return_eta : bool¶

If True, store the full latent field η in the posterior. Default False — η is n × draws × chains, which can be large. A scalar summary ||η||² is always stored.

n_jobs : int¶

Number of parallel chains. -1 = all CPUs.

progressbar : bool¶

Show per-chain progress bars.

gibbs_method : str, default "auto"¶

Which Gibbs sampler path to use:

"auto": select based on JAX availability and CHOLMOD. When CHOLMOD is available, uses "factorize" (fastest on CPU for sparse W). When CHOLMOD is unavailable but JAX is installed and n ≤ 10 000, uses "jax_dense". Otherwise falls back to SPLU factorisation.
"factorize": force factorisation-based path (CHOLMOD if available, else scipy.sparse.linalg.splu). Exact but O(nnz^{1.5}) for the factorisation step.
"jax_dense": force JAX-accelerated path (dense matvec + vmap over Lanczos probes and Chebyshev draws). Requires JAX with float64 enabled. Viable for n ≤ ~10 000 on machines with ≥ 32 GB RAM (the dense matrices need ~800 MB at n = 10 000).

pg_n_terms : int, default 20¶

Number of sum-of-exponentials terms for the JAX Pólya–Gamma sampler. Higher values reduce bias at the cost of more compute. Only used when gibbs_method="jax_dense".

n_probes : int, default 10¶

Number of Lanczos probe vectors for stochastic log|P| estimation. Only used when gibbs_method="jax_dense".

lanczos_deg : int, default 30¶

Lanczos iteration depth for log|P| estimation. Only used when gibbs_method="jax_dense".

mh_proposal_sd : float, default 0.05¶

Standard deviation of the random-walk MH proposal for ρ. Only used when use_mala=False.

use_mala : bool, default True¶

If True, use MALA (gradient-guided proposals) for the ρ update. If False, use random-walk Metropolis–Hastings. Only used when gibbs_method="jax_dense".

Returns:¶

With posterior, log_likelihood, and observed_data groups.

Return type:¶

az.InferenceData

Raises:¶

TypeError – If NUTS-specific kwargs (nuts_sampler, target_accept) are passed.

fitted_values()[source]¶

Return fitted values at posterior mean parameters.

Returns:¶: Posterior-mean fitted values.
Return type:¶: np.ndarray

property inference_data : arviz.data.inference_data.InferenceData | None[source]¶

Return the ArviZ InferenceData from the most recent fit.

Returns:¶: The inference data object, or None if the model has not been fit yet.
Return type:¶: arviz.InferenceData or None

property pymc_model : pymc.model.core.Model | None[source]¶

Return the PyMC model object built for the most recent fit.

For Gibbs-fitted models the PyMC model is not constructed during sampling; it is built lazily on first access so that downstream consumers (e.g. bridge sampling for marginal likelihoods) can evaluate logp and the prior under the same model definition used by the NUTS path.

Returns:¶: The model object used by fit(), or None if the instance has not been fit yet.
Return type:¶: pymc.Model or None

residuals()[source]¶

Return residuals on the observed scale.

Returns:¶: Residual vector y - fitted_values.
Return type:¶: np.ndarray

spatial_diagnostics()[source]¶

Run Bayesian LM specification tests and return a summary table.

Looks up the diagnostic suite registered for this model class and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLS model runs LM-Lag, LM-Error, LM-SDM-Joint, and LM-SLX-Error-Joint, while an SAR model runs LM-Error, LM-WX, and Robust-LM-WX.

Requires the model to have been fit (.fit() called) and a spatial weights matrix W to have been supplied at construction time.

Returns:¶

DataFrame indexed by test name with columns:

Column	Description
statistic	Posterior mean of the LM statistic
median	Posterior median of the LM statistic
df	Degrees of freedom for the \(\chi^2\) reference
p_value	Bayesian p-value: `1 - chi2.cdf(mean, df)`
ci_lower	Lower bound of 95% credible interval (2.5%)
ci_upper	Upper bound of 95% credible interval (97.5%)

The DataFrame has attrs["model_type"] (class name) and attrs["n_draws"] (total posterior draws) metadata.

Return type:¶

pandas.DataFrame

Raises:¶

RuntimeError – If the model has not been fit yet.
ValueError – If no spatial weights matrix W was supplied.

See also

spatial_diagnostics_decision: Model-selection decision based on the test results.
spatial_effects: Posterior inference for direct/indirect/total impacts.

Examples

>>> ols = OLS(formula="price ~ income + crime", data=df, W=w)
>>> ols.fit()
>>> ols.spatial_diagnostics()
                 statistic  median  df  p_value  ci_lower  ci_upper
LM-Lag                3.21    2.98   1    0.073      0.12      8.54
LM-Error              5.67    5.34   1    0.017      0.34     12.10
LM-SDM-Joint          7.89    7.12   4    0.096      1.23     18.32
LM-SLX-Error-Joint    6.45    5.98   4    0.168      0.89     15.67

spatial_diagnostics_decision(alpha=0.05, format='graphviz')[source]¶

Return a model-selection decision from Bayesian LM test results.

Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical stge_kb procedure in Anselin et al. [1996]). The decision logic depends on the current model type and the pattern of significant tests:

From OLS (6-test decision tree):

If only LM-Lag is significant → SAR.
If only LM-Error is significant → SEM.
If both are significant → use the Anselin–Florax / Koley–Bera robust pair: Robust-LM-Lag → SAR, Robust-LM-Error → SEM, both → SARAR. If neither robust test is significant, fall back to the lower raw p-value.
If neither naive test is significant → OLS.

From SAR (3-test decision tree):

LM-Error significant → SARAR; LM-WX significant → SDM; Robust-LM-WX significant → SDM.

From SEM (2-test decision tree):

LM-Lag significant → SARAR; LM-WX significant → SDEM.

From SLX (4-test decision tree):

Robust-LM-Lag-SDM significant → SDM; Robust-LM-Error-SDEM significant → SDEM; both → MANSAR; neither → SLX.

From SDM: LM-Error-SDM significant → MANSAR; else SDM.

From SDEM: LM-Lag-SDEM significant → MANSAR; else SDEM.

Parameters:¶

alpha : float, default 0.05¶: Significance level for the Bayesian p-values.
format : {"graphviz", "ascii", "model"}, default "graphviz"¶: Output format. "model" returns the recommended-model name string. "ascii" returns an indented box-drawing rendering of the full decision tree with the chosen path highlighted. "graphviz" returns a graphviz.Digraph object that renders inline in Jupyter; if the optional graphviz package is not installed a UserWarning is issued and the ASCII rendering is returned instead.

Returns:¶

Recommended model name when format="model", an ASCII tree string when format="ascii", or a graphviz.Digraph when format="graphviz" (with ASCII fallback on missing dep).

Return type:¶

str or graphviz.Digraph