Panel Flow Models with fit_approx¶
This notebook demonstrates the panel flow model classes and focuses on fit_approx, the variational inference entry point for fast approximate posterior inference.
The examples use very small synthetic panels so the notebook executes quickly during the docs build while still exercising the full workflow for:
SARFlowPanelSARFlowSeparablePanelPoissonSARFlowPanelPoissonSARFlowSeparablePanel
import warnings
import numpy as np
import pandas as pd
from bayespecon import (
PoissonSARFlowPanel,
PoissonSARFlowSeparablePanel,
SARFlowPanel,
SARFlowSeparablePanel,
)
from bayespecon.dgp.flows import (
generate_panel_flow_data,
generate_panel_flow_data_separable,
generate_panel_poisson_flow_data,
generate_panel_poisson_flow_data_separable,
)
warnings.filterwarnings('ignore', category=FutureWarning)
np.set_printoptions(precision=3, suppress=True)
pd.options.display.float_format = '{:.3f}'.format
# Each panel DGP shares the auto-G + default `log_distance` behaviour of
# the cross-sectional flow DGPs: omit `G`/`gdf` and a synthetic point grid +
# KNN graph (`knn_k=4`) is built on the fly. The fitted graph is returned
# in the result dict so the model classes can re-use it.
T = 2
FIT_APPROX_KW = dict(draws=80, n=150, progressbar=False, random_seed=0)
gaussian_data = generate_panel_flow_data(
n=4,
T=T,
rho_d=0.20,
rho_o=0.15,
rho_w=0.05,
beta_d=[1.0],
beta_o=[0.4],
sigma=0.5,
sigma_alpha=0.2,
seed=10,
)
gaussian_sep_data = generate_panel_flow_data_separable(
n=4,
T=T,
rho_d=0.25,
rho_o=0.15,
beta_d=[1.0],
beta_o=[0.4],
sigma=0.5,
sigma_alpha=0.2,
seed=11,
)
poisson_data = generate_panel_poisson_flow_data(
n=4,
T=T,
rho_d=0.20,
rho_o=0.10,
rho_w=0.05,
beta_d=[0.6],
beta_o=[0.2],
seed=12,
k=1,
)
poisson_sep_data = generate_panel_poisson_flow_data_separable(
n=4,
T=T,
rho_d=0.20,
rho_o=0.10,
beta_d=[0.6],
beta_o=[0.2],
seed=13,
k=1,
)
# All four DGPs synthesise the same geometry from `n=4`, so any of the
# returned graphs is fine to share with the model constructors below.
G = gaussian_data['G']
pd.DataFrame(
[
{'dataset': 'Gaussian unrestricted', 'y_shape': gaussian_data['y'].shape, 'X_shape': gaussian_data['X'].shape, 'features': gaussian_data['col_names']},
{'dataset': 'Gaussian separable', 'y_shape': gaussian_sep_data['y'].shape, 'X_shape': gaussian_sep_data['X'].shape, 'features': gaussian_sep_data['col_names']},
{'dataset': 'Poisson unrestricted', 'y_shape': poisson_data['y'].shape, 'X_shape': poisson_data['X'].shape, 'features': poisson_data['col_names']},
{'dataset': 'Poisson separable', 'y_shape': poisson_sep_data['y'].shape, 'X_shape': poisson_sep_data['X'].shape, 'features': poisson_sep_data['col_names']},
]
)
| dataset | y_shape | X_shape | features | |
|---|---|---|---|---|
| 0 | Gaussian unrestricted | (32,) | (32, 6) | [intercept, intra_indicator, dest_x0, orig_x0,... |
| 1 | Gaussian separable | (32,) | (32, 6) | [intercept, intra_indicator, dest_x0, orig_x0,... |
| 2 | Poisson unrestricted | (32,) | (32, 6) | [intercept, intra_indicator, dest_x0, orig_x0,... |
| 3 | Poisson separable | (32,) | (32, 6) | [intercept, intra_indicator, dest_x0, orig_x0,... |
Asymmetric attributes:
generate_panel_flow_dataandgenerate_panel_poisson_flow_datanow supportbeta_dandbeta_owith different lengths (k_d ≠ k_o). The design matrix layout adjusts automatically. For example:data = generate_panel_flow_data(n=5, T=3, beta_d=[1.0, -0.5], beta_o=[0.4], ...)
Note — lognormal default (since v0.2).
generate_panel_flow_datanow returns strictly-positive flows by default, drawn from \(y_t = \exp(\eta_t)\) where \(\eta_t = A^{-1}(X_t\beta + \alpha + \varepsilon_t)\) is the latent SAR-filtered linear predictor (also exposed asdata['eta']). The Gaussian-likelihood panel models (SARFlowPanel,SARFlowSeparablePanel) operate on the latent scale, so we fit onnp.log(data['y']). Passdistribution="normal"to recover the legacy Gaussian-on-y behaviour.
The Poisson-likelihood models are unchanged (their DGPs always emit non-negative counts).
fit_approx pattern¶
Each model uses the same basic workflow:
Construct the panel model.
Call
fit_approx(...)with eithermethod='advi'ormethod='fullrank_advi'.Inspect
model.summary(...)ormodel.inference_data.
In these examples, n is the number of optimization iterations used by pm.fit, while draws is the number of posterior samples drawn from the fitted variational approximation.
gaussian_model = SARFlowPanel(
y=np.log(gaussian_data['y']), # latent SAR scale (lognormal default)
G=G,
X=gaussian_data['X'],
T=T,
col_names=gaussian_data['col_names'],
model=0,
miter=5,
titer=50,
trace_seed=0,
)
gaussian_idata = gaussian_model.fit_approx(method='advi', **FIT_APPROX_KW)
print(type(gaussian_model.approximation).__name__)
print(sorted(gaussian_idata.posterior.data_vars))
gaussian_model.summary(var_names=['rho_d', 'rho_o', 'rho_w', 'sigma'], round_to=3)
MeanField
['beta', 'rho_d', 'rho_o', 'rho_simplex', 'rho_w', 'sigma']
Finished [100%]: Average Loss = 188.42
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| rho_d | 0.241 | 0.157 | 0.031 | 0.556 | 0.017 | 0.013 | 71.161 | 92.072 | NaN |
| rho_o | 0.216 | 0.124 | 0.014 | 0.445 | 0.019 | 0.011 | 46.559 | 71.660 | NaN |
| rho_w | 0.257 | 0.159 | 0.017 | 0.552 | 0.022 | 0.011 | 31.136 | 33.037 | NaN |
| sigma | 13.429 | 11.120 | 1.372 | 38.170 | 1.203 | 1.573 | 78.438 | 95.145 | NaN |
gaussian_sep_model = SARFlowSeparablePanel(
y=np.log(gaussian_sep_data['y']), # latent SAR scale (lognormal default)
G=G,
X=gaussian_sep_data['X'],
T=T,
col_names=gaussian_sep_data['col_names'],
model=0,
trace_seed=0,
)
gaussian_sep_idata = gaussian_sep_model.fit_approx(method='fullrank_advi', **FIT_APPROX_KW)
sep_identity_error = np.max(
np.abs(
gaussian_sep_idata.posterior['rho_w'].values
+ gaussian_sep_idata.posterior['rho_d'].values * gaussian_sep_idata.posterior['rho_o'].values
)
)
print(sorted(gaussian_sep_idata.posterior.data_vars))
print(f'max |rho_w + rho_d * rho_o| = {sep_identity_error:.3e}')
gaussian_sep_model.summary(var_names=['rho_d', 'rho_o', 'rho_w', 'sigma'], round_to=3)
['beta', 'rho_d', 'rho_o', 'rho_w', 'sigma']
max |rho_w + rho_d * rho_o| = 0.000e+00
Finished [100%]: Average Loss = 212.52
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| rho_d | 0.143 | 0.515 | -0.754 | 0.942 | 0.062 | 0.027 | 65.607 | 57.622 | NaN |
| rho_o | 0.034 | 0.476 | -0.809 | 0.676 | 0.066 | 0.021 | 52.964 | 70.031 | NaN |
| rho_w | -0.016 | 0.233 | -0.420 | 0.429 | 0.038 | 0.014 | 39.354 | 73.158 | NaN |
| sigma | 22.456 | 42.706 | 0.247 | 73.518 | 4.657 | 11.030 | 62.517 | 92.072 | NaN |
Poisson panel flow models¶
The Poisson panel variants are currently pooled-only (model=0). By default, fit_approx omits the large deterministic lambda array from the stored posterior samples, which keeps the inference data compact.
poisson_model = PoissonSARFlowPanel(
y=poisson_data['y'],
G=G,
X=poisson_data['X'],
T=T,
col_names=poisson_data['col_names'],
model=0,
miter=5,
titer=50,
trace_seed=0,
)
poisson_idata = poisson_model.fit_approx(method='advi', **FIT_APPROX_KW)
print(sorted(poisson_idata.posterior.data_vars))
print('lambda' in poisson_idata.posterior.data_vars)
poisson_model.summary(var_names=['rho_d', 'rho_o', 'rho_w'], round_to=3)
['beta', 'rho_d', 'rho_o', 'rho_simplex', 'rho_w']
False
Finished [100%]: Average Loss = 2.7173e+16
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
| mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
|---|---|---|---|---|---|---|---|---|---|
| rho_d | 0.285 | 0.170 | 0.019 | 0.550 | 0.020 | 0.013 | 67.322 | 90.717 | NaN |
| rho_o | 0.237 | 0.141 | 0.053 | 0.502 | 0.019 | 0.010 | 52.821 | 70.031 | NaN |
| rho_w | 0.236 | 0.142 | 0.026 | 0.511 | 0.015 | 0.010 | 60.230 | 92.072 | NaN |
poisson_sep_model = PoissonSARFlowSeparablePanel(
y=poisson_sep_data['y'],
G=G,
X=poisson_sep_data['X'],
T=T,
col_names=poisson_sep_data['col_names'],
model=0,
trace_seed=0,
)
poisson_sep_idata = poisson_sep_model.fit_approx(method='fullrank_advi', **FIT_APPROX_KW)
comparison = pd.DataFrame(
[
{
'model': 'SARFlowPanel',
'method': 'advi',
'posterior_vars': ', '.join(sorted(gaussian_idata.posterior.data_vars)),
},
{
'model': 'SARFlowSeparablePanel',
'method': 'fullrank_advi',
'posterior_vars': ', '.join(sorted(gaussian_sep_idata.posterior.data_vars)),
},
{
'model': 'PoissonSARFlowPanel',
'method': 'advi',
'posterior_vars': ', '.join(sorted(poisson_idata.posterior.data_vars)),
},
{
'model': 'PoissonSARFlowSeparablePanel',
'method': 'fullrank_advi',
'posterior_vars': ', '.join(sorted(poisson_sep_idata.posterior.data_vars)),
},
]
)
print(sorted(poisson_sep_idata.posterior.data_vars))
poisson_sep_model.summary(var_names=['rho_d', 'rho_o', 'rho_w'], round_to=3)
comparison
['beta', 'rho_d', 'rho_o', 'rho_w']
Finished [100%]: Average Loss = 2.6614e+19
arviz - WARNING - Shape validation failed: input_shape: (1, 80), minimum_shape: (chains=2, draws=4)
| model | method | posterior_vars | |
|---|---|---|---|
| 0 | SARFlowPanel | advi | beta, rho_d, rho_o, rho_simplex, rho_w, sigma |
| 1 | SARFlowSeparablePanel | fullrank_advi | beta, rho_d, rho_o, rho_w, sigma |
| 2 | PoissonSARFlowPanel | advi | beta, rho_d, rho_o, rho_simplex, rho_w |
| 3 | PoissonSARFlowSeparablePanel | fullrank_advi | beta, rho_d, rho_o, rho_w |