bayespecon.dgp.generate_poisson_flow_data¶

bayespecon.dgp.generate_poisson_flow_data(n=None, k=2, k_d=None, k_o=None, rho_d=0.3, rho_o=0.2, rho_w=0.1, beta_d=None, beta_o=None, gamma_dist=-0.5, seed=42, G=None, err_hetero=False, gdf=None, knn_k=4)[source]¶

Generate synthetic origin-destination flow count data for a Poisson spatial autoregressive flow model.

The data-generating process follows:

\[\eta = A(\rho_d, \rho_o, \rho_w)^{-1} X\beta, \qquad y_{ij} \sim \operatorname{Poisson}(\exp(\eta_{ij}))\]

where the system matrix is

\[A = I_N - \rho_d (I_n \otimes W) - \rho_o (W \otimes I_n) - \rho_w (W \otimes W), \quad N = n^2\]

and \(W\) is the row-standardised spatial weight matrix.

Parameters:¶

n : int, default 10¶: Approximate number of spatial units. When neither G nor gdf is provided, a rook-contiguity grid with round(sqrt(n)) units per side is created, yielding approximately n units. Total number of flows is N = n_actual^2. When G is provided, n must match the number of units in G.
k : int, default 2¶: Number of destination/origin attribute columns when k_d and k_o are not specified (excluding intercepts added internally). Ignored when k_d and/or k_o are provided, or when beta_d/beta_o are lists whose length determines k_d/k_o.
k_d : int or None, default None¶: Number of destination-side attribute columns. Overrides k for the destination side when provided.
k_o : int or None, default None¶: Number of origin-side attribute columns. Overrides k for the origin side when provided.
rho_d : float, default 0.3¶: Destination autocorrelation parameter.
rho_o : float, default 0.2¶: Origin autocorrelation parameter.
rho_w : float, default 0.1¶: Network autocorrelation parameter.
beta_d : float or list of float or None, default None¶: Destination-side coefficients for the k attributes. A scalar broadcasts to all columns. Defaults to 1.0 for all columns.
beta_o : float or list of float or None, default None¶: Origin-side coefficients. Defaults to 1.0 for all columns.
seed : int, default 42¶: Seed for numpy.random.default_rng.
G : libpysal.graph.Graph or None, default None¶: Row-standardised spatial graph on n units. If None, a rook-contiguity graph on a regular grid is constructed automatically via resolve_weights().
err_hetero : bool, default False¶: Accepted for API parity with other DGP functions; ignored for the Poisson model (the variance is determined by the mean).
gdf : GeoDataFrame or None, default None¶: Accepted for API parity; ignored (use G instead).

Returns:¶

y_vecnp.ndarray, shape (N,), dtype int64: Flattened count observations.
y_matnp.ndarray, shape (n, n), dtype int64: Count observations reshaped as an O×D matrix.
eta_vecnp.ndarray, shape (N,): Log-mean (spatially filtered linear predictor).
lambda_vecnp.ndarray, shape (N,): Poisson means (\(\exp(\eta_{ij})\)).
Xdnp.ndarray, shape (n, k): Destination-side regional attribute matrix.
Xdnp.ndarray, shape (n, k): Destination-side regional attribute matrix.
Xonp.ndarray, shape (n, k): Origin-side regional attribute matrix.
Xnp.ndarray, shape (N, p): Full O-D design matrix (for model fitting).
designFlowDesignMatrix: Full O-D design matrix (for downstream inspection).
Wnp.ndarray, shape (n, n): Dense row-standardised weight matrix.
Glibpysal.graph.Graph: Spatial graph.
rho_d, rho_o, rho_w: True autocorrelation parameters.
beta_d, beta_o: True coefficient vectors.

Return type:¶

dict with keys

Raises:¶

np.linalg.LinAlgError – If the system matrix \(A\) is singular (usually because rho_d + rho_o + rho_w >= 1).

Examples

>>> from bayespecon.dgp import generate_poisson_flow_data
>>> data = generate_poisson_flow_data(n=9, seed=0)
>>> data["y_mat"].dtype
dtype('int64')
>>> data["lambda_vec"].shape
(81,)
>>> data["Xd"].shape
(9, 2)
>>> data["Xo"].shape
(9, 2)