bayespecon.models.flow.SARNegBinFlowLatent¶

class bayespecon.models.flow.SARNegBinFlowLatent(y, G, X, **kwargs)[source]¶

Bayesian structural-form SAR-NB flow model with Pólya–Gamma Gibbs sampler.

The structural form parameterises the latent log-mean as

\[\eta = \rho_d W_d \eta + \rho_o W_o \eta + \rho_w W_w \eta + X\beta + \nu, \quad \nu \sim N(0, \sigma^2 I_N)\]

where \(N = n^2\) and \(W_d = I_n \otimes W\), \(W_o = W \otimes I_n\), \(W_w = W \otimes W\).

Three free \(\rho\) parameters are estimated via collapsed 1-D slice samplers (one per \(\rho\), cycling with the others fixed). The \(\eta\) draw uses the general sparse \(N \times N\) precision matrix with Chebyshev polynomial approximation.

Use this model when: - The separability constraint \(\rho_w = -\rho_d \rho_o\) is too

restrictive for the data.

You need to test whether \(\rho_w\) is significantly different from \(-\rho_d \rho_o\).

Use SARNegBinFlowSeparableLatent when: - The separability constraint is plausible (most flow applications). - You want the faster \(O(n^3)\) Kronecker-structured sampler.

Parameters:¶

y : array-like of int, shape (n, n) or (N,)¶

Observed non-negative integer flow counts.

G : libpysal.graph.Graph¶

Row-standardised spatial graph on n units.

X : np.ndarray or pandas.DataFrame, shape (N, p)¶

Full origin-destination design matrix with \(N = n^2\) rows.

col_names : list of str, optional

Column labels for X.

k : int, optional

Number of regional attribute columns.

logdet_method : str, default "traces"

Log-determinant method for the \(N \\times N\) flow log-determinant. Only "traces" is supported because the 3-\(\\rho\) logdet \(\\log|I_N - \\rho_d W_d - \\rho_o W_o - \\rho_w W_w|\) cannot be decomposed into \(n \\times n\) eigenvalues.

priors : dict, optional

Override default priors. Supported keys:

beta_mu : float, default 0.0 — Normal prior mean for beta.
beta_sigma : float, default 1e6 — Normal prior std for beta.
sigma_sigma : float, default 10.0 — HalfNormal prior std for sigma.
alpha_sigma : float, default 10.0 — HalfNormal prior std for alpha.
rho_lower : float, default -0.999 — Lower bound for each \(\rho\).
rho_upper : float, default 0.999 — Upper bound for each \(\rho\).

Notes

The sampler bypasses PyMC’s NUTS entirely. It produces an arviz.InferenceData object compatible with all downstream diagnostics (spatial_diagnostics(), spatial_effects(), summary()).

The fit() method does not accept nuts_sampler or target_accept kwargs — these are NUTS-specific and will raise TypeError if passed.

\(\\alpha\) (NB dispersion) mixing can be slower than \(\\rho\) or \(\\beta\). Monitor ESS for \(\\alpha\) specifically and use longer runs if needed.

__init__(y, G, X, **kwargs)[source]¶

Methods

`__init__`(y, G, X, **kwargs)
`fit`([draws, tune, chains, random_seed, ...])	Sample posterior via Pólya–Gamma block Gibbs.
`fit_approx`([draws, n, method, random_seed, ...])	Fit a variational approximation and return posterior draws.
`posterior_predictive`([n_draws, random_seed])	Draw posterior-predictive samples `y_rep`.
`spatial_diagnostics`()	Run Bayesian LM specification tests for flow models.
`spatial_diagnostics_decision`([alpha, format])	Return a model-selection decision from Bayesian LM test results.
`spatial_effects`([draws, ...])	Summarise posterior origin/destination/intra/network/total effects.
`summary`([var_names])	Return posterior summary table via ArviZ.

Attributes

`approximation`	Return the most recent PyMC variational approximation, if any.
`inference_data`	Return ArviZ InferenceData from the most recent fit, or None.
`pymc_model`	Return the PyMC model used for the most recent fit, or None.

property approximation[source]¶: Return the most recent PyMC variational approximation, if any.

fit(draws=2000, tune=1000, chains=4, random_seed=None, thin=1, return_eta=False, n_jobs=-1, progressbar=True, chebyshev_degree=30, **kwargs)[source]¶

Sample posterior via Pólya–Gamma block Gibbs.

Parameters:¶

draws : int¶: Number of post-warmup draws per chain.
tune : int¶: Number of warmup (burn-in) draws per chain.
chains : int¶: Number of independent chains.
random_seed : int or None¶: Seed for reproducibility.
thin : int¶: Keep every thin-th draw. Default 1.
return_eta : bool¶: If True, store the full latent field η. Default False.
n_jobs : int¶: Number of parallel chains. -1 = all CPUs.
progressbar : bool¶: Show per-chain progress bars.
chebyshev_degree : int, default 30¶: Chebyshev polynomial degree for η draw.

Return type:¶

arviz.InferenceData

fit_approx(draws=2000, n=10000, method='advi', random_seed=None, store_lambda=False, compute_log_likelihood=True, **fit_kwargs)[source]¶

Fit a variational approximation and return posterior draws.

Parameters:¶

draws : int, default 2000¶: Number of samples to draw from the fitted approximation.
n : int, default 10000¶: Number of optimisation iterations for pm.fit.
method : {"advi", "fullrank_advi"}, default "advi"¶: Variational inference family to fit.
random_seed : int, optional¶: Seed for optimisation and posterior sampling.
store_lambda : bool, default False¶: If True, keep the high-dimensional fitted mean lambda in the posterior draws.
compute_log_likelihood : bool, default True¶: If True, compute pointwise log-likelihood after sampling and attach to the InferenceData (with Jacobian correction for SAR flow variants), enabling az.loo / az.waic.
**fit_kwargs¶: Additional keyword arguments forwarded to pm.fit.

property inference_data : arviz.data.inference_data.InferenceData | None[source]¶: Return ArviZ InferenceData from the most recent fit, or None.

posterior_predictive(n_draws=None, random_seed=None)[source]¶

Draw posterior-predictive samples y_rep.

For each (subsampled) posterior draw, simulates a new flow vector y_rep from the implied data-generating process by solving the sparse system A(rho) y_rep = X β + ε (Gaussian) or y_rep ~ Poisson(exp(A^{-1} X β)) (Poisson variants).

Parameters:¶

n_draws : int, optional¶: Number of posterior draws to use. Defaults to all available.
random_seed : int, optional¶: Seed for the noise/Poisson sampler.

Returns:¶

Array of shape (n_draws, N) with posterior-predictive flows.

Return type:¶

np.ndarray

property pymc_model : pymc.model.core.Model | None[source]¶: Return the PyMC model used for the most recent fit, or None.

spatial_diagnostics()[source]¶

Run Bayesian LM specification tests for flow models.

Looks up the diagnostic suite registered for this model class and returns a tidy DataFrame with one row per test. See bayespecon.models.base.SpatialModel.spatial_diagnostics() for the column schema.

Raises:¶: RuntimeError – If the model has not been fit yet.

spatial_diagnostics_decision(alpha=0.05, format='graphviz')[source]¶

Return a model-selection decision from Bayesian LM test results.

Walks the flow decision tree using Bayesian p-values from spatial_diagnostics() and recommends either OLSFlow (no spatial dependence detected) or SARFlow (at least one direction is significant).

Parameters:¶

alpha : float, default 0.05¶: Significance level for the Bayesian p-values.
format : {"graphviz", "ascii", "model"}, default "graphviz"¶: Output format. "model" returns the recommended model name string. "ascii" returns an indented box-drawing tree. "graphviz" returns a graphviz.Digraph (with ASCII fallback if graphviz is not installed).

Return type:¶

str or graphviz.Digraph

spatial_effects(draws=None, return_posterior_samples=False, ci=0.95, mode='auto')[source]¶

Summarise posterior origin/destination/intra/network/total effects.

Wraps _compute_spatial_effects_posterior() to produce a tidy DataFrame indexed by predictor with posterior means, credible-interval bounds, and Bayesian p-values for each effect type (origin, destination, intra, network, total). Following Thomas-Agnan & LeSage (2014, §83.5.2), when destination and origin design blocks differ the decomposition is reported separately for shocks applied to each side.

Parameters:¶

draws : int, optional¶: Maximum number of posterior draws to use. Defaults to all.
return_posterior_samples : bool, default False¶: If True, also return the underlying posterior-draw arrays.
ci : float, default 0.95¶: Credible-interval coverage.
mode : {"auto", "combined", "separate"}, default "auto"¶: Controls whether destination- and origin-side effects are summed or reported separately. "auto" collapses to combined when the destination and origin design blocks are identical (self._symmetric_xo_xd) and reports both sides otherwise. "combined" always sums; "separate" always reports both.

Returns:¶

Long-format summary indexed by (predictor, side, effect) where side is one of "combined", "dest", "orig".

Return type:¶

pandas.DataFrame, or (DataFrame, dict)

summary(var_names=None, **kwargs)[source]¶

Return posterior summary table via ArviZ.

Parameters:¶

var_names : list, optional¶: Variable names to include. Defaults to all parameters.
**kwargs¶: Additional keyword arguments forwarded to az.summary.

Return type:¶

pandas.DataFrame