bayespecon.models.flow.SEMFlow¶
- class bayespecon.models.flow.SEMFlow(y, G, X, **kwargs)[source]¶
Bayesian spatial-error flow model with three free spatial parameters.
\[y = X\beta + u, \qquad B u = \varepsilon, \qquad B = I_N - \lambda_d W_d - \lambda_o W_o - \lambda_w W_w, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_N)\]where \(W_d = I_n \otimes W\), \(W_o = W \otimes I_n\), \(W_w = W \otimes W\). The Kronecker spatial structure is identical to
SARFlow, but the spatial filter acts on the disturbance rather than the dependent variable. Equivalently the model implies a Gaussian likelihood with covariance \(\sigma^2 (B^\top B)^{-1}\).Marginal mean is \(\mathbb{E}[y] = X\beta\), so there are no \(X\)-mediated spatial spillovers — the LeSage / Thomas-Agnan decomposition reduces to the closed-form expressions used by
OLSFlow(direct effect equals \(\beta\), network effect equals zero). UseSARFlowif spillovers from observed covariates are of interest.- Parameters:¶
- y : array-like, shape (n, n) or (N,)¶
Observed origin-destination flow matrix or its vec-form.
- G : libpysal.graph.Graph¶
Row-standardised spatial graph on n units.
- X : np.ndarray or pandas.DataFrame, shape (N, p)¶
Full origin-destination design matrix with \(N = n^2\) rows. DataFrame columns are preserved as feature names.
- col_names : list of str, optional
Column labels for
X. Inferred from a DataFrame if omitted; otherwise defaults to["x0", "x1", ...].- k : int, optional
Number of regional attribute columns (destination/origin variable pairs). Inferred from
dest_*/orig_*column names when the standard LeSage layout is used.- logdet_method : str, default "traces"
Log-determinant method. Only
"traces"is supported here.- restrict_positive : bool, default True
If True, use
pm.Dirichlet("lam_simplex", a=ones(4))to enforce \(\\lambda_d, \\lambda_o, \\lambda_w \\geq 0\) and \(\\lambda_d + \\lambda_o + \\lambda_w \\leq 1\). If False, three independentpm.Uniform(lam_lower, lam_upper)priors are used with a differentiable quadratic-wall stability potential.- miter : int, default 30
Trace polynomial order for the log-determinant.
- titer : int, default 800
Geometric tail cutoff for the log-determinant series.
- trace_riter : int, default 50
Number of Monte Carlo probes for trace estimation.
- trace_seed : int, optional
Random seed for trace estimation reproducibility.
- symmetric_xo_xd : bool, optional
If
None(default), origin and destination design blocks are compared and symmetry is auto-detected.- priors : dict, optional
Override default priors. Supported keys:
beta_mu: float, default 0.0 — Normal prior mean forbeta.beta_sigma: float, default 1e6 — Normal prior std forbeta.sigma_sigma: float, default 10.0 — HalfNormal prior std forsigma.lam_lower: float, default -1.0 — Lower bound of Uniform prior on each λ (only whenrestrict_positive=False).lam_upper: float, default 1.0 — Upper bound of Uniform prior on each λ (only whenrestrict_positive=False).
Notes
Implementation: PyMC body uses precomputed lags of both
yandX(self._Wd,self._Wo,self._Wwapplied toself._X_design) so that the residual \(B u = B y - B X \\beta\) is expressible as a linear combination of fixed quantities — no symbolic sparse mat-vec is required. The Jacobian \(\\log|B|\) reuses the same trace-based polynomial asSARFlow.Methods
__init__(y, G, X, **kwargs)fit([draws, tune, chains, target_accept, ...])Draw samples from the posterior.
fit_approx([draws, n, method, random_seed, ...])Fit a variational approximation and return posterior draws.
posterior_predictive([n_draws, random_seed, ...])Draw posterior-predictive samples
y_rep.Run Bayesian LM specification tests for flow models.
spatial_diagnostics_decision([alpha, ...])Return a model-selection decision from Bayesian LM test results.
spatial_effects([draws, ...])Summarise posterior origin/destination/intra/network/total effects.
summary([var_names])Return posterior summary table via ArviZ.
Attributes
Return the most recent PyMC variational approximation, if any.
Return ArviZ InferenceData from the most recent fit, or None.
Return the PyMC model used for the most recent fit, or None.
-
fit(draws=
2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, store_lambda=False, idata_kwargs=None, **sample_kwargs)[source]¶ Draw samples from the posterior.
- Parameters:¶
- draws : int, default 2000¶
Number of posterior samples per chain (after tuning).
- tune : int, default 1000¶
Number of tuning (warm-up) steps per chain.
- chains : int, default 4¶
Number of parallel chains.
- target_accept : float, default 0.9¶
Target acceptance rate for NUTS.
- random_seed : int, optional¶
Seed for reproducibility.
- store_lambda : bool, default False¶
If True, include the high-dimensional fitted mean
lambdain the stored posterior. Leaving this False reduces memory and conversion overhead for Poisson flow models.- idata_kwargs : dict, optional¶
Forwarded to
pm.sample. Defaults to{"log_likelihood": True}so thataz.loo/az.waic/az.comparework out of the box; for SAR flow variants the captured Gaussian log-likelihood is post-processed to add the Jacobian contribution fromlog|I_N - rho_d W_d - rho_o W_o - rho_w W_w|.- **sample_kwargs¶
Additional keyword arguments forwarded to
pm.sample.
- Return type:¶
arviz.InferenceData
-
fit_approx(draws=
2000, n=10000, method='advi', random_seed=None, store_lambda=False, compute_log_likelihood=True, **fit_kwargs)[source]¶ Fit a variational approximation and return posterior draws.
- Parameters:¶
- draws : int, default 2000¶
Number of samples to draw from the fitted approximation.
- n : int, default 10000¶
Number of optimisation iterations for
pm.fit.- method : {"advi", "fullrank_advi"}, default "advi"¶
Variational inference family to fit.
- random_seed : int, optional¶
Seed for optimisation and posterior sampling.
- store_lambda : bool, default False¶
If True, keep the high-dimensional fitted mean
lambdain the posterior draws.- compute_log_likelihood : bool, default True¶
If True, compute pointwise log-likelihood after sampling and attach to the InferenceData (with Jacobian correction for SAR flow variants), enabling
az.loo/az.waic.- **fit_kwargs¶
Additional keyword arguments forwarded to
pm.fit.
- property inference_data : arviz.data.inference_data.InferenceData | None[source]¶
Return ArviZ InferenceData from the most recent fit, or None.
-
posterior_predictive(n_draws=
None, random_seed=None, parallel=-1)[source]¶ Draw posterior-predictive samples
y_rep.For each (subsampled) posterior draw, simulates a new flow vector
y_repfrom the implied data-generating process by solving the sparse systemA(rho) y_rep = X β + ε(Gaussian) ory_rep ~ Poisson(exp(A^{-1} X β))(Poisson variants).- Parameters:¶
- n_draws : int, optional¶
Number of posterior draws to use. Defaults to all available.
- random_seed : int, optional¶
Seed for the noise/Poisson sampler.
- parallel : int or None, default -1¶
Number of worker threads for the per-draw loop.
-1usesos.cpu_count();None/0/1forces sequential execution. Reproducibility under a fixedrandom_seedis preserved across worker counts viaSeedSequence.spawn.
- Returns:¶
Array of shape
(n_draws, N)with posterior-predictive flows.- Return type:¶
np.ndarray
- property pymc_model : pymc.model.core.Model | None[source]¶
Return the PyMC model used for the most recent fit, or None.
- spatial_diagnostics()[source]¶
Run Bayesian LM specification tests for flow models.
Iterates over the class-level
_spatial_diagnostics_testsregistry and returns a tidy DataFrame with one row per test. Seebayespecon.models.base.SpatialModel.spatial_diagnostics()for the column schema.- Raises:¶
RuntimeError – If the model has not been fit yet.
-
spatial_diagnostics_decision(alpha=
0.05, format='graphviz', theme='default')[source]¶ Return a model-selection decision from Bayesian LM test results.
Walks the flow decision tree using Bayesian p-values from
spatial_diagnostics()and recommends eitherOLSFlow(no spatial dependence detected) orSARFlow(at least one direction is significant).- Parameters:¶
- alpha : float, default 0.05¶
Significance level for the Bayesian p-values.
- format : {"graphviz", "ascii", "model"}, default "graphviz"¶
Output format.
"model"returns the recommended model name string."ascii"returns an indented box-drawing tree."graphviz"returns agraphviz.Digraph(with ASCII fallback if graphviz is not installed).
- Return type:¶
str or graphviz.Digraph
-
spatial_effects(draws=
None, return_posterior_samples=False, ci=0.95, mode='auto', parallel=-1)[source]¶ Summarise posterior origin/destination/intra/network/total effects.
Wraps
_compute_spatial_effects_posterior()to produce a tidy DataFrame indexed by predictor with posterior means, credible-interval bounds, and Bayesian p-values for each effect type (origin, destination, intra, network, total). Following Thomas-Agnan & LeSage (2014, §83.5.2), when destination and origin design blocks differ the decomposition is reported separately for shocks applied to each side.- Parameters:¶
- draws : int, optional¶
Maximum number of posterior draws to use. Defaults to all.
- return_posterior_samples : bool, default False¶
If True, also return the underlying posterior-draw arrays.
- ci : float, default 0.95¶
Credible-interval coverage.
- mode : {"auto", "combined", "separate"}, default "auto"¶
Controls whether destination- and origin-side effects are summed or reported separately.
"auto"collapses to combined when the destination and origin design blocks are identical (self._symmetric_xo_xd) and reports both sides otherwise."combined"always sums;"separate"always reports both.- parallel : int or None, default -1¶
Number of worker threads for the per-draw effects loop.
-1usesos.cpu_count();None/0/1forces sequential execution. Ignored by closed-form (OLSFlow,SEMFlow) variants.
- Returns:¶
Long-format summary indexed by
(predictor, side, effect)wheresideis one of"combined","dest","orig".- Return type:¶
pandas.DataFrame, or (DataFrame, dict)