bayespecon.dgp.generate_panel_flow_data

bayespecon.dgp.generate_panel_flow_data(n=None, T=5, G=None, rho_d=0.3, rho_o=0.2, rho_w=0.1, beta_d=None, beta_o=None, sigma=1.0, sigma_alpha=0.5, gamma_dist=-0.5, seed=None, k=None, err_hetero=False, gdf=None, knn_k=4, distribution='lognormal')[source]

Simulate panel flow data from a SAR flow model with unit effects.

For each period \(t = 1, \dots, T\), generates \(N = n^2\) latent flow observations from:

\[\eta_t = A^{-1}(X_t \beta + \alpha + \varepsilon_t), \quad A = I_N - \rho_d W_d - \rho_o W_o - \rho_w W_w, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma^2 I_N)\]

where \(\alpha \sim \mathcal{N}(0, \sigma_\alpha^2 I_N)\) are O-D-pair random effects drawn once and held fixed across periods. The observed flows are \(y_t = \exp(\eta_t)\) under the default distribution="lognormal" (strictly positive flows), or \(y_t = \eta_t\) under distribution="normal" (legacy Gaussian-on-y behaviour).

Observations are stacked in time-first order so that the observation at index \(t \cdot n^2 + k\) is O-D pair \(k\) at time \(t\).

To recover the SAR parameters with the existing SARFlowPanel / SARFlowSeparablePanel, fit on np.log(y).

Parameters:
n : int

Number of spatial units. Must match the size of G.

T : int

Number of time periods.

G : libpysal.graph.Graph

Row-standardised spatial graph on n units.

rho_d : float

Destination spatial autoregressive parameter.

rho_o : float

Origin spatial autoregressive parameter.

rho_w : float

Network (origin-destination) spatial autoregressive parameter.

beta_d : array-like, shape (k_d,)

Destination-side regression coefficients.

beta_o : array-like, shape (k_o,)

Origin-side regression coefficients. When k_o != k_d, separate destination and origin attribute matrices are generated.

sigma : float, default 1.0

Standard deviation of the idiosyncratic error term.

sigma_alpha : float, default 0.5

Standard deviation of the O-D-pair random effect. Set to 0 for pooled (no unit effects).

seed : int, optional

Random seed for reproducibility.

k : int, optional

Number of regional attribute columns. If None, inferred from the length of beta_d.

err_hetero : bool, default False

Accepted for API parity with other DGP functions; currently ignored (homoskedastic errors are always generated).

gdf : object, optional

Accepted for API parity with other DGP functions; not used (pass G directly instead).

knn_k : int, default 4

Number of nearest neighbours used when synthesising a default graph from a synthetic point grid.

distribution : {"lognormal", "normal"}, default "lognormal"

Observation-scale family. "lognormal" returns y = exp(eta) (strictly positive flows, the default); "normal" returns y = eta (legacy Gaussian-on-y).

Returns:

Dictionary with keys:

  • "y" (n²T,): time-first stacked flow vector on the observation scale.

  • "eta" (n²T,): latent SAR-filtered linear predictor (equals log(y) under distribution="lognormal").

  • "distribution" str: the value of the distribution arg.

  • "X" (n²T, p): time-first stacked O-D design matrix.

  • "col_names" list[str]: feature names.

  • "G" libpysal.graph.Graph: spatial graph.

  • "rho_d", "rho_o", "rho_w", "sigma", "sigma_alpha": true parameters.

  • "beta_d", "beta_o": true coefficient vectors.

  • "params_true" dict: nested dict of all true parameters (including "distribution").

Return type:

dict

Raises:

ValueError – If the A matrix is singular (invalid parameter combination).