bayespecon.models.SEMTobit

class bayespecon.models.SEMTobit(*args, censoring=0.0, **kwargs)[source]

Bayesian spatial error Tobit model.

\[y^* = X\beta + u,\quad u = \lambda W u + \varepsilon, \quad \varepsilon \sim N(0,\sigma^2 I)\]

with observed outcome y = max(c, y*).

Robust regression

When robust=True, the spatially-filtered error distribution is changed from Normal to Student-t. For uncensored observations:

\[f(y^*_i \mid \mu_i, \sigma, \nu) = \frac{1}{\sigma} \, t_\nu\!\left(\frac{y^*_i - \mu_i}{\sigma}\right)\]

and for censored observations:

\[P(y^*_i \le c) = T_\nu\!\left(\frac{c - \mu_i}{\sigma}\right)\]

where \(T_\nu\) is the Student-t CDF and \(\nu \sim \mathrm{TruncExp}(\lambda_\nu, \mathrm{lower}=2)\) with rate nu_lam (default 1/30).

__init__(*args, censoring=0.0, **kwargs)[source]

Methods

__init__(*args[, censoring])

fit([draws, tune, chains, target_accept, ...])

Sample posterior and attach pointwise log-likelihood for IC metrics.

fitted_values()

Return fitted values at posterior mean parameters.

residuals()

Return residuals on the observed scale.

spatial_diagnostics()

Run Bayesian LM specification tests and return a summary table.

spatial_diagnostics_decision([alpha])

Return a model-selection decision from Bayesian LM test results.

spatial_effects([return_posterior_samples])

Compute Bayesian inference for direct, indirect, and total impacts.

summary([var_names])

Return posterior summary table.

Attributes

inference_data

Return the ArviZ InferenceData from the most recent fit.

pymc_model

Return the PyMC model object built for the most recent fit.

fit(draws=2000, tune=1000, chains=4, target_accept=0.9, random_seed=None, idata_kwargs=None, **sample_kwargs)[source]

Sample posterior and attach pointwise log-likelihood for IC metrics.

The SEM Tobit model uses pm.Potential for both the error log-likelihood and the Jacobian, so nothing is auto-captured. We compute the complete pointwise log-likelihood manually after sampling, using the Tobit censoring formula:

  • Uncensored: log N(y | mu, sigma^2)

  • Censored: log Phi((c - mu) / sigma)

where mu = X @ beta and the spatial filtering is absorbed into the Jacobian.

fitted_values()[source]

Return fitted values at posterior mean parameters.

Returns:

Posterior-mean fitted values.

Return type:

np.ndarray

property inference_data : arviz.data.inference_data.InferenceData | None[source]

Return the ArviZ InferenceData from the most recent fit.

Returns:

The inference data object, or None if the model has not been fit yet.

Return type:

arviz.InferenceData or None

property pymc_model : pymc.model.core.Model | None[source]

Return the PyMC model object built for the most recent fit.

Returns:

The model object used by fit(), or None if the instance has not been fit yet.

Return type:

pymc.Model or None

residuals()[source]

Return residuals on the observed scale.

Returns:

Residual vector y - fitted_values.

Return type:

np.ndarray

spatial_diagnostics()[source]

Run Bayesian LM specification tests and return a summary table.

Iterates over the class-level _spatial_diagnostics_tests registry and calls each test function on this fitted model, collecting the results into a tidy DataFrame. The set of tests depends on the model type — for example, an OLS model runs LM-Lag, LM-Error, LM-SDM-Joint, and LM-SLX-Error-Joint, while an SAR model runs LM-Error, LM-WX, and Robust-LM-WX.

Requires the model to have been fit (.fit() called) and a spatial weights matrix W to have been supplied at construction time.

Returns:

DataFrame indexed by test name with columns:

Column

Description

statistic

Posterior mean of the LM statistic

median

Posterior median of the LM statistic

df

Degrees of freedom for the \(\chi^2\) reference

p_value

Bayesian p-value: 1 - chi2.cdf(mean, df)

ci_lower

Lower bound of 95% credible interval (2.5%)

ci_upper

Upper bound of 95% credible interval (97.5%)

The DataFrame has attrs["model_type"] (class name) and attrs["n_draws"] (total posterior draws) metadata.

Return type:

pandas.DataFrame

Raises:
  • RuntimeError – If the model has not been fit yet.

  • ValueError – If no spatial weights matrix W was supplied.

See also

spatial_diagnostics_decision

Model-selection decision based on the test results.

spatial_effects

Posterior inference for direct/indirect/total impacts.

Examples

>>> ols = OLS(formula="price ~ income + crime", data=df, W=w)
>>> ols.fit()
>>> ols.spatial_diagnostics()
                 statistic  median  df  p_value  ci_lower  ci_upper
LM-Lag                3.21    2.98   1    0.073      0.12      8.54
LM-Error              5.67    5.34   1    0.017      0.34     12.10
LM-SDM-Joint          7.89    7.12   4    0.096      1.23     18.32
LM-SLX-Error-Joint    6.45    5.98   4    0.168      0.89     15.67
spatial_diagnostics_decision(alpha=0.05)[source]

Return a model-selection decision from Bayesian LM test results.

Implements the decision tree from Koley and Bera [2024] (the Bayesian analogue of the classical stge_kb procedure in Anselin et al. [1996]). The decision logic depends on the current model type and the pattern of significant tests:

From OLS (4-test decision tree):

  1. If LM-SDM-Joint is significant → test Robust-LM-Lag-SDM and Robust-LM-Error-SDEM (requires re-fitting SLX first). If neither robust test is significant → OLS.

  2. If LM-Lag is significant and LM-Error is not → SAR.

  3. If LM-Error is significant and LM-Lag is not → SEM.

  4. If both are significant → test Robust-Lag and Robust-Error. If Robust-Lag is significant → SAR; if Robust-Error → SEM; if neither → SARAR (both lag and error).

From SAR (3-test decision tree):

  • LM-Error significant → SARAR; LM-WX significant → SDM; Robust-LM-WX significant → SDM.

From SEM (2-test decision tree):

  • LM-Lag significant → SARAR; LM-WX significant → SDEM.

From SLX (4-test decision tree):

  • Robust-LM-Lag-SDM significant → SDM; Robust-LM-Error-SDEM significant → SDEM; both → MANSAR; neither → SLX.

From SDM: LM-Error significant → MANSAR; else SDM.

From SDEM: LM-Lag significant → MANSAR; else SDEM.

Parameters:
alpha : float, default 0.05

Significance level for the Bayesian p-values.

Returns:

Recommended model name (e.g. "SAR", "SDM", "OLS").

Return type:

str

See also

spatial_diagnostics

Compute the Bayesian LM test statistics.

References

Koley and Bera [2024], Anselin et al. [1996]

spatial_effects(return_posterior_samples=False)[source]

Compute Bayesian inference for direct, indirect, and total impacts.

Computes impact measures for each posterior draw, then summarises the posterior distribution with means, 95% credible intervals, and Bayesian p-values. This is the fully Bayesian analog of the simulation-based approach in LeSage and Pace [2009] and the asymptotic variance formulas in Arbia et al. [2020].

Models without a spatial lag on y do not exhibit global feedback propagation through \((I-\rho W)^{-1}\). However, models with spatially lagged covariates (SLX, SDEM) can still have non-zero neighbour spillovers captured in the indirect term.

Parameters:
return_posterior_samples : bool, optional

If True, return a (DataFrame, dict) tuple where the dict contains the full posterior draws under keys "direct", "indirect", and "total". Default False.

Returns:

If return_posterior_samples is False (default), returns a DataFrame indexed by feature names with columns for posterior means, credible-interval bounds, and Bayesian p-values.

If return_posterior_samples is True, returns (DataFrame, dict) where the dict has keys "direct", "indirect", "total", each mapping to a (G, k) array of posterior draws.

Return type:

pd.DataFrame or tuple of (pd.DataFrame, dict)

summary(var_names=None, **kwargs)[source]

Return posterior summary table.

Parameters:
var_names : list, optional

Variable names to include in the summary.

**kwargs

Additional arguments passed to arviz.summary().

Returns:

Posterior summary statistics.

Return type:

pandas.DataFrame