Panel Flow Models with fit_approx

This notebook demonstrates the panel flow model classes and focuses on fit_approx, the variational inference entry point for fast approximate posterior inference.

The examples use very small synthetic panels so the notebook executes quickly during the docs build while still exercising the full workflow for:

  • SARFlowPanel

  • SARFlowSeparablePanel

  • PoissonSARFlowPanel

  • PoissonSARFlowSeparablePanel

import warnings

import numpy as np
import pandas as pd

from bayespecon import (
    PoissonSARFlowPanel,
    PoissonSARFlowSeparablePanel,
    SARFlowPanel,
    SARFlowSeparablePanel,
)
from bayespecon.dgp.flows import (
    generate_panel_flow_data,
    generate_panel_flow_data_separable,
    generate_panel_poisson_flow_data,
    generate_panel_poisson_flow_data_separable,
)

warnings.filterwarnings('ignore', category=FutureWarning)
np.set_printoptions(precision=3, suppress=True)
pd.options.display.float_format = '{:.3f}'.format
# Each panel DGP shares the auto-G + default `log_distance` behaviour of
# the cross-sectional flow DGPs: omit `G`/`gdf` and a synthetic point grid +
# KNN graph (`knn_k=4`) is built on the fly. The fitted graph is returned
# in the result dict so the model classes can re-use it.
T = 2
FIT_APPROX_KW = dict(draws=80, n=150, progressbar=False, random_seed=0)

gaussian_data = generate_panel_flow_data(
    n=4,
    T=T,
    rho_d=0.20,
    rho_o=0.15,
    rho_w=0.05,
    beta_d=[1.0],
    beta_o=[0.4],
    sigma=0.5,
    sigma_alpha=0.2,
    seed=10,
)
gaussian_sep_data = generate_panel_flow_data_separable(
    n=4,
    T=T,
    rho_d=0.25,
    rho_o=0.15,
    beta_d=[1.0],
    beta_o=[0.4],
    sigma=0.5,
    sigma_alpha=0.2,
    seed=11,
)
poisson_data = generate_panel_poisson_flow_data(
    n=4,
    T=T,
    rho_d=0.20,
    rho_o=0.10,
    rho_w=0.05,
    beta_d=[0.6],
    beta_o=[0.2],
    seed=12,
    k=1,
)
poisson_sep_data = generate_panel_poisson_flow_data_separable(
    n=4,
    T=T,
    rho_d=0.20,
    rho_o=0.10,
    beta_d=[0.6],
    beta_o=[0.2],
    seed=13,
    k=1,
)

# All four DGPs synthesise the same geometry from `n=4`, so any of the
# returned graphs is fine to share with the model constructors below.
G = gaussian_data['G']

pd.DataFrame(
    [
        {'dataset': 'Gaussian unrestricted', 'y_shape': gaussian_data['y'].shape, 'X_shape': gaussian_data['X'].shape, 'features': gaussian_data['col_names']},
        {'dataset': 'Gaussian separable', 'y_shape': gaussian_sep_data['y'].shape, 'X_shape': gaussian_sep_data['X'].shape, 'features': gaussian_sep_data['col_names']},
        {'dataset': 'Poisson unrestricted', 'y_shape': poisson_data['y'].shape, 'X_shape': poisson_data['X'].shape, 'features': poisson_data['col_names']},
        {'dataset': 'Poisson separable', 'y_shape': poisson_sep_data['y'].shape, 'X_shape': poisson_sep_data['X'].shape, 'features': poisson_sep_data['col_names']},
    ]
)
dataset y_shape X_shape features
0 Gaussian unrestricted (32,) (32, 6) [intercept, intra_indicator, dest_x0, orig_x0,...
1 Gaussian separable (32,) (32, 6) [intercept, intra_indicator, dest_x0, orig_x0,...
2 Poisson unrestricted (32,) (32, 6) [intercept, intra_indicator, dest_x0, orig_x0,...
3 Poisson separable (32,) (32, 6) [intercept, intra_indicator, dest_x0, orig_x0,...

Asymmetric attributes: generate_panel_flow_data and generate_panel_poisson_flow_data now support beta_d and beta_o with different lengths (k_d k_o). The design matrix layout adjusts automatically. For example:

data = generate_panel_flow_data(n=5, T=3, beta_d=[1.0, -0.5], beta_o=[0.4], ...)

Note — lognormal default (since v0.2). generate_panel_flow_data now returns strictly-positive flows by default, drawn from \(y_t = \exp(\eta_t)\) where \(\eta_t = A^{-1}(X_t\beta + \alpha + \varepsilon_t)\) is the latent SAR-filtered linear predictor (also exposed as data['eta']). The Gaussian-likelihood panel models (SARFlowPanel, SARFlowSeparablePanel) operate on the latent scale, so we fit on np.log(data['y']). Pass distribution="normal" to recover the legacy Gaussian-on-y behaviour.
The Poisson-likelihood models are unchanged (their DGPs always emit non-negative counts).

fit_approx pattern

Each model uses the same basic workflow:

  1. Construct the panel model.

  2. Call fit_approx(...) with either method='advi' or method='fullrank_advi'.

  3. Inspect model.summary(...) or model.inference_data.

In these examples, n is the number of optimization iterations used by pm.fit, while draws is the number of posterior samples drawn from the fitted variational approximation.

gaussian_model = SARFlowPanel(
    y=np.log(gaussian_data['y']),  # latent SAR scale (lognormal default)
    G=G,
    X=gaussian_data['X'],
    T=T,
    col_names=gaussian_data['col_names'],
    model=0,
    miter=5,
    titer=50,
    trace_seed=0,
)
gaussian_idata = gaussian_model.fit_approx(method='advi', **FIT_APPROX_KW)

print(type(gaussian_model.approximation).__name__)
print(sorted(gaussian_idata.posterior.data_vars))
gaussian_model.summary(var_names=['rho_d', 'rho_o', 'rho_w', 'sigma'], round_to=3)
MeanField
['beta', 'rho_d', 'rho_o', 'rho_simplex', 'rho_w', 'sigma']
Finished [100%]: Average Loss = 188.42
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho_d 0.241 0.157 0.031 0.556 0.017 0.013 71.161 92.072 NaN
rho_o 0.216 0.124 0.014 0.445 0.019 0.011 46.559 71.660 NaN
rho_w 0.257 0.159 0.017 0.552 0.022 0.011 31.136 33.037 NaN
sigma 13.429 11.120 1.372 38.170 1.203 1.573 78.438 95.145 NaN
gaussian_sep_model = SARFlowSeparablePanel(
    y=np.log(gaussian_sep_data['y']),  # latent SAR scale (lognormal default)
    G=G,
    X=gaussian_sep_data['X'],
    T=T,
    col_names=gaussian_sep_data['col_names'],
    model=0,
    trace_seed=0,
)
gaussian_sep_idata = gaussian_sep_model.fit_approx(method='fullrank_advi', **FIT_APPROX_KW)

sep_identity_error = np.max(
    np.abs(
        gaussian_sep_idata.posterior['rho_w'].values
        + gaussian_sep_idata.posterior['rho_d'].values * gaussian_sep_idata.posterior['rho_o'].values
    )
)
print(sorted(gaussian_sep_idata.posterior.data_vars))
print(f'max |rho_w + rho_d * rho_o| = {sep_identity_error:.3e}')
gaussian_sep_model.summary(var_names=['rho_d', 'rho_o', 'rho_w', 'sigma'], round_to=3)
['beta', 'rho_d', 'rho_o', 'rho_w', 'sigma']
max |rho_w + rho_d * rho_o| = 0.000e+00
Finished [100%]: Average Loss = 212.52
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho_d 0.143 0.515 -0.754 0.942 0.062 0.027 65.607 57.622 NaN
rho_o 0.034 0.476 -0.809 0.676 0.066 0.021 52.964 70.031 NaN
rho_w -0.016 0.233 -0.420 0.429 0.038 0.014 39.354 73.158 NaN
sigma 22.456 42.706 0.247 73.518 4.657 11.030 62.517 92.072 NaN

Poisson panel flow models

The Poisson panel variants are currently pooled-only (model=0). By default, fit_approx omits the large deterministic lambda array from the stored posterior samples, which keeps the inference data compact.

poisson_model = PoissonSARFlowPanel(
    y=poisson_data['y'],
    G=G,
    X=poisson_data['X'],
    T=T,
    col_names=poisson_data['col_names'],
    model=0,
    miter=5,
    titer=50,
    trace_seed=0,
)
poisson_idata = poisson_model.fit_approx(method='advi', **FIT_APPROX_KW)

print(sorted(poisson_idata.posterior.data_vars))
print('lambda' in poisson_idata.posterior.data_vars)
poisson_model.summary(var_names=['rho_d', 'rho_o', 'rho_w'], round_to=3)
['beta', 'rho_d', 'rho_o', 'rho_simplex', 'rho_w']
False
Finished [100%]: Average Loss = 2.7173e+16
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
rho_d 0.285 0.170 0.019 0.550 0.020 0.013 67.322 90.717 NaN
rho_o 0.237 0.141 0.053 0.502 0.019 0.010 52.821 70.031 NaN
rho_w 0.236 0.142 0.026 0.511 0.015 0.010 60.230 92.072 NaN
poisson_sep_model = PoissonSARFlowSeparablePanel(
    y=poisson_sep_data['y'],
    G=G,
    X=poisson_sep_data['X'],
    T=T,
    col_names=poisson_sep_data['col_names'],
    model=0,
    trace_seed=0,
)
poisson_sep_idata = poisson_sep_model.fit_approx(method='fullrank_advi', **FIT_APPROX_KW)

comparison = pd.DataFrame(
    [
        {
            'model': 'SARFlowPanel',
            'method': 'advi',
            'posterior_vars': ', '.join(sorted(gaussian_idata.posterior.data_vars)),
        },
        {
            'model': 'SARFlowSeparablePanel',
            'method': 'fullrank_advi',
            'posterior_vars': ', '.join(sorted(gaussian_sep_idata.posterior.data_vars)),
        },
        {
            'model': 'PoissonSARFlowPanel',
            'method': 'advi',
            'posterior_vars': ', '.join(sorted(poisson_idata.posterior.data_vars)),
        },
        {
            'model': 'PoissonSARFlowSeparablePanel',
            'method': 'fullrank_advi',
            'posterior_vars': ', '.join(sorted(poisson_sep_idata.posterior.data_vars)),
        },
    ]
)

print(sorted(poisson_sep_idata.posterior.data_vars))
poisson_sep_model.summary(var_names=['rho_d', 'rho_o', 'rho_w'], round_to=3)

comparison
['beta', 'rho_d', 'rho_o', 'rho_w']
Finished [100%]: Average Loss = 2.6614e+19
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
model method posterior_vars
0 SARFlowPanel advi beta, rho_d, rho_o, rho_simplex, rho_w, sigma
1 SARFlowSeparablePanel fullrank_advi beta, rho_d, rho_o, rho_w, sigma
2 PoissonSARFlowPanel advi beta, rho_d, rho_o, rho_simplex, rho_w
3 PoissonSARFlowSeparablePanel fullrank_advi beta, rho_d, rho_o, rho_w