bayespecon.diagnostics.bayesfactor.bayes_factor_compare_models

bayespecon.diagnostics.bayesfactor.bayes_factor_compare_models(models, model_labels=None, method='bridge', prior_note=None, return_diagnostics=False, **kwargs)[source]

Compute all pairwise Bayes factors for a set of Bayesian models.

Following bayestestR’s bayesfactor_models() ([Makowski et al., 2019]), this function estimates marginal likelihoods for each model and then computes Bayes factors as ratios of marginal likelihoods:

\[BF_{ij} = ML_i / ML_j = \exp(\log ML_i - \log ML_j)\]

The transitivity property of Bayes factors allows all pairwise comparisons from a single set of marginal likelihood estimates:

\[BF_{AB} = BF_{AC} / BF_{BC} = ML_A / ML_B\]
Parameters:
models : list or dict

Fitted model objects or InferenceData objects to compare.

  • Fitted model objects (recommended): Each object must have inference_data and pymc_model attributes (e.g., a fitted SAR, SEM, etc.). For bridge sampling, the log-posterior is automatically compiled from the PyMC model.

  • Dict of {str: model_object}: Keys are used as model labels (unless model_labels is also provided), matching the convention of arviz.compare().

  • List of InferenceData: For method='bic', InferenceData objects can be passed directly. For method='bridge', fitted model objects are required so the log-posterior can be compiled automatically.

model_labels : list of str, optional

Labels for each model (for DataFrame index/columns). If None, models are labeled by index (or by dict keys when models is a dict).

method : str, default 'bridge'

Marginal likelihood estimation method. Supported:

  • 'bridge': Bridge sampling ([Meng and Wong, 1996]). Uses the iterative scheme with a multivariate normal proposal, ESS weighting, two-phase convergence, and MCSE diagnostics — following the R bridgesampling package ([Gronau et al., 2020]). Requires fitted model objects so the log-posterior can be compiled automatically.

  • 'bic': BIC approximation ([Wagenmakers, 2007]). Computes \(\log(ML) \approx -BIC/2\). Works with either fitted model objects or InferenceData.

prior_note : str, optional

Optional string describing the priors used (for reporting).

return_diagnostics : bool, default False

If True, also return a dict of diagnostics for each model.

**kwargs

Additional keyword arguments forwarded to the marginal-likelihood estimator. For method='bridge', the following are accepted:

  • maxiter (int, default 1000): Maximum bridge iterations per phase.

  • tol1 (float, default 1e-10): Convergence tolerance for Phase 1 (criterion = "r").

  • tol2 (float, default 1e-4): Convergence tolerance for Phase 2 (criterion = "logml"), used if Phase 1 does not converge.

  • use_neff (bool, default True): Use effective sample size (ESS) instead of nominal sample size in the bridge-function weights, matching the R package’s use_neff argument.

  • repetitions (int, default 1): Number of times to repeat bridge sampling with different proposal draws. If > 1, the median logml across repetitions is returned.

  • random_state (int or None): Random seed for reproducibility.

Returns:

  • bayes_factors (pandas.DataFrame) – DataFrame of Bayes factors (BF[i, j] = BF for model i vs model j; >1 favors i over j). The diagonal is 1 (each model compared to itself).

  • diagnostics (dict) – If return_diagnostics=True, a dict of diagnostics for each model (keyed by model label) is returned as a second element. Each value is a dict containing:

    • logml: Estimated log marginal likelihood.

    • iterations: Number of bridge iterations.

    • mcse_logml: Monte Carlo standard error of logml ([Micaletto and Vehtari, 2025]).

    • converged: Whether the iterative scheme converged.

    • neff: Effective sample size used for weighting (if use_neff=True).

    • N1, N2: Number of posterior and proposal samples.

    • method: "bridge" or "bic".

    • repetitions: Number of repetitions (if > 1, also includes logml_reps with per-repetition estimates).

Notes

Bridge sampling algorithm. The 'bridge' method implements the iterative bridge sampling estimator of Meng and Wong [1996], eq. 4.1, with the “optimal” bridge function. The procedure is:

  1. Split posterior samples in half: first half fits a multivariate normal proposal, second half drives the iterative scheme.

  2. Evaluate the log unnormalized posterior at both posterior and proposal samples (via log_posterior).

  3. Run the iterative scheme in two phases:

    • Phase 1: criterion = "r" (relative change in the ratio), tolerance tol1.

    • Phase 2 (if Phase 1 fails): restart with the geometric mean of the last two ratios, criterion = "logml", tolerance tol2.

  4. Compute MCSE following Micaletto & Vehtari (2025).

ESS weighting. By default (use_neff=True), the bridge function uses \(s_1 = n_{\mathrm{eff}} / (n_{\mathrm{eff}} + N_2)\) instead of \(s_1 = N_1 / (N_1 + N_2)\), matching the R bridgesampling package. This down-weights autocorrelated samples and generally improves accuracy.

Interpreting Bayes factors. The Bayes factor quantifies the relative evidence for two competing models given the observed data.

  • BF > 1 favors the row model; BF < 1 favors the column model.

  • Conventional thresholds (Jeffreys, 1961; Kass & Raftery, 1995): 1–3 (anecdotal), 3–10 (moderate), 10–30 (strong), 30–100 (very strong), >100 (extreme).

Sample size. For bridge sampling, at least 40,000 posterior samples are recommended for precise estimates ([Gronau et al., 2020]). A warning is emitted when fewer samples are detected.

BIC vs. bridge sampling. The 'bic' method approximates \(\log(ML) \approx \hat{\ell}_{\max} - \frac{k}{2}\log(n)\), which assumes unit-information priors (priors containing as much information as a single observation). When the actual priors are wider — as is typical for spatial models — bridge sampling will penalize complex models more heavily than BIC. This is not a bug: it reflects the fact that wide priors on unnecessary parameters reduce the marginal likelihood (Bayesian Occam’s razor). For models with Normal(0, 100) priors on WX coefficients, the bridge-sampling penalty per coefficient can be 5–10× larger than the BIC penalty. When the two methods disagree, bridge sampling is generally more trustworthy because it accounts for the actual prior specification.

Posterior model probabilities. After computing Bayes factors, posterior model probabilities can be obtained via post_prob():

from bayespecon import post_prob
probs = post_prob(logml_list, model_names=model_labels)

Examples

Compare fitted spatial models using bridge sampling (recommended):

from bayespecon import bayes_factor_compare_models

# Pass fitted model objects directly — log-posterior is compiled
# automatically from each model's PyMC model.
bf = bayes_factor_compare_models(
    {"SAR": sar, "SEM": sem, "SDM": sdm},
    method="bridge",
)

Quick comparison using the BIC approximation (no log-posterior needed):

bf = bayes_factor_compare_models(
    {"SAR": sar, "SEM": sem},
    method="bic",
)

With diagnostics and repetitions:

bf, diag = bayes_factor_compare_models(
    {"SAR": sar, "SEM": sem},
    method="bridge",
    return_diagnostics=True,
    repetitions=3,
    random_seed=42,
)