bayespecon.dgp.generate_poisson_flow_data

bayespecon.dgp.generate_poisson_flow_data(n=None, k=2, k_d=None, k_o=None, rho_d=0.3, rho_o=0.2, rho_w=0.1, beta_d=None, beta_o=None, gamma_dist=-0.5, seed=42, G=None, err_hetero=False, gdf=None, knn_k=4)[source]

Generate synthetic origin-destination flow count data for a Poisson spatial autoregressive flow model.

The data-generating process follows:

\[\eta = A(\rho_d, \rho_o, \rho_w)^{-1} X\beta, \qquad y_{ij} \sim \operatorname{Poisson}(\exp(\eta_{ij}))\]

where the system matrix is

\[A = I_N - \rho_d (I_n \otimes W) - \rho_o (W \otimes I_n) - \rho_w (W \otimes W), \quad N = n^2\]

and \(W\) is the row-standardised spatial weight matrix.

Parameters:
n : int, default 10

Approximate number of spatial units. When neither G nor gdf is provided, a rook-contiguity grid with round(sqrt(n)) units per side is created, yielding approximately n units. Total number of flows is N = n_actual^2. When G is provided, n must match the number of units in G.

k : int, default 2

Number of destination/origin attribute columns when k_d and k_o are not specified (excluding intercepts added internally). Ignored when k_d and/or k_o are provided, or when beta_d/beta_o are lists whose length determines k_d/k_o.

k_d : int or None, default None

Number of destination-side attribute columns. Overrides k for the destination side when provided.

k_o : int or None, default None

Number of origin-side attribute columns. Overrides k for the origin side when provided.

rho_d : float, default 0.3

Destination autocorrelation parameter.

rho_o : float, default 0.2

Origin autocorrelation parameter.

rho_w : float, default 0.1

Network autocorrelation parameter.

beta_d : float or list of float or None, default None

Destination-side coefficients for the k attributes. A scalar broadcasts to all columns. Defaults to 1.0 for all columns.

beta_o : float or list of float or None, default None

Origin-side coefficients. Defaults to 1.0 for all columns.

seed : int, default 42

Seed for numpy.random.default_rng.

G : libpysal.graph.Graph or None, default None

Row-standardised spatial graph on n units. If None, a rook-contiguity graph on a regular grid is constructed automatically via resolve_weights().

err_hetero : bool, default False

Accepted for API parity with other DGP functions; ignored for the Poisson model (the variance is determined by the mean).

gdf : GeoDataFrame or None, default None

Accepted for API parity; ignored (use G instead).

Returns:

y_vecnp.ndarray, shape (N,), dtype int64

Flattened count observations.

y_matnp.ndarray, shape (n, n), dtype int64

Count observations reshaped as an O×D matrix.

eta_vecnp.ndarray, shape (N,)

Log-mean (spatially filtered linear predictor).

lambda_vecnp.ndarray, shape (N,)

Poisson means (\(\exp(\eta_{ij})\)).

Xdnp.ndarray, shape (n, k)

Destination-side regional attribute matrix.

Xdnp.ndarray, shape (n, k)

Destination-side regional attribute matrix.

Xonp.ndarray, shape (n, k)

Origin-side regional attribute matrix.

Xnp.ndarray, shape (N, p)

Full O-D design matrix (for model fitting).

designFlowDesignMatrix

Full O-D design matrix (for downstream inspection).

Wnp.ndarray, shape (n, n)

Dense row-standardised weight matrix.

Glibpysal.graph.Graph

Spatial graph.

rho_d, rho_o, rho_w

True autocorrelation parameters.

beta_d, beta_o

True coefficient vectors.

Return type:

dict with keys

Raises:

np.linalg.LinAlgError – If the system matrix \(A\) is singular (usually because rho_d + rho_o + rho_w >= 1).

Examples

>>> from bayespecon.dgp import generate_poisson_flow_data
>>> data = generate_poisson_flow_data(n=9, seed=0)
>>> data["y_mat"].dtype
dtype('int64')
>>> data["lambda_vec"].shape
(81,)
>>> data["Xd"].shape
(9, 2)
>>> data["Xo"].shape
(9, 2)