bayespecon.dgp.generate_flow_data¶
-
bayespecon.dgp.generate_flow_data(n=
None, G=None, rho_d=0.3, rho_o=0.2, rho_w=0.1, beta_d=None, beta_o=None, sigma=1.0, X=None, col_names=None, dist=None, gamma_dist=-0.5, alpha=0.0, seed=None, gdf=None, err_hetero=False, knn_k=4, distribution='lognormal')[source]¶ Simulate flow data from a SAR flow model.
Generates \(N = n^2\) flow observations. The latent SAR-filtered process is
\[\eta = A^{-1}(X\beta + \varepsilon), \quad A = I_N - \rho_d W_d - \rho_o W_o - \rho_w W_w, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_N)\]and the observed flows are either
\[y = \exp(\eta) \quad \text{(default, } \texttt{distribution="lognormal"})\]so that \(y > 0\) and \(\mathbb{E}[y] = \exp(\eta + \sigma^2/2)\), or \(y = \eta\) when
distribution="normal"(legacy Gaussian-on-y behaviour).To recover the SAR parameters with the existing
SARFlow/SARFlowSeparable, fit onnp.log(y_vec)(which by construction equalseta_vec).- Parameters:¶
- n : int¶
Number of spatial units. Must match the size of G.
- G : libpysal.graph.Graph¶
Row-standardised spatial graph on n units.
- rho_d : float¶
Destination spatial autoregressive parameter.
- rho_o : float¶
Origin spatial autoregressive parameter.
- rho_w : float¶
Network (origin-destination) spatial autoregressive parameter.
- beta_d : array-like, shape (k_d,)¶
Destination-side regression coefficients.
- beta_o : array-like, shape (k_o,)¶
Origin-side regression coefficients. When
k_o != k_d, separate destination and origin attribute matrices are generated or required.- sigma : float, default 1.0¶
Standard deviation of the error term.
- X : np.ndarray, shape (n, k) or (n, k_d + k_o), optional¶
Regional attribute matrix. If None, draws X_d and X_o separately from N(0, 1). If a single matrix is provided with
k_d == k_o, it is used for both destination and origin blocks. If it hask_d + k_ocolumns, the firstk_dare used as destination attributes and the remainingk_oas origin attributes.- col_names : list[str], optional¶
Names for the k columns of X.
- dist : np.ndarray, shape (n, n), optional¶
Distance / cost matrix. If
None(default), one is computed automatically from gdf (or from a synthetic point grid when gdf is alsoNone) and entered aslog(1 + d)in the design matrix. Pass an array explicitly to override.- gamma_dist : float, default -0.5¶
True coefficient on the (log-) distance column in the DGP. Defaults to
-0.5to mimic gravity-model distance decay; set to0.0to neutralize the effect.- alpha : float, default 0.0¶
Intercept term added uniformly to all latent flow cells. Under
distribution="lognormal"(default) this multiplies the observed flows byexp(alpha); underdistribution="normal"it is an additive shift ony.- seed : int, optional¶
Random seed for reproducibility.
- gdf : geopandas.GeoDataFrame, optional¶
Geometry source used to derive distance. If
Noneand dist is alsoNone, a synthetic point grid is built viasynth_point_geodataframe().- err_hetero : bool, default False¶
If True, generate heteroskedastic innovations: each flow cell \((i,j)\) has standard deviation \(\sigma \sqrt{1 + \|x_i\|^2 + \|x_j\|^2}\) where \(x_i\), \(x_j\) are the destination and origin attribute vectors for that cell.
- knn_k : int, default 4¶
Number of nearest neighbours used when synthesising a default graph from a synthetic point grid (see
_resolve_flow_geometry()).- distribution : {"lognormal", "normal"}, default "lognormal"¶
Observation-scale family.
"lognormal"returnsy = exp(eta)(strictly positive flows, the default)."normal"returnsy = eta(legacy Gaussian-on-y behaviour). In both cases"eta_vec"/"eta_mat"is also exposed in the return dict.
- Returns:¶
Dictionary with keys:
"y_vec"(N,): vectorised flows on the observation scale."y_mat"(n, n): flow matrix form."eta_vec"(N,): latent SAR-filtered linear predictor (equalslog(y_vec)whendistribution="lognormal")."eta_mat"(n, n):eta_vecreshaped."distribution"str: the value of the distribution arg."X"(N, p): full O-D design matrix (for model fitting)."X_regional"(n, k_d): destination-side regional attribute matrix."X_regional_d"(n, k_d): destination-side regional attribute matrix."X_regional_o"(n, k_o): origin-side regional attribute matrix."design"FlowDesignMatrix: full design."W"scipy.sparse.csr_matrix: n×n weight matrix."G"libpysal.graph.Graph: spatial graph."rho_d","rho_o","rho_w","sigma": true parameters."beta_d","beta_o": true coefficient vectors.
- Return type:¶
- Raises:¶
ValueError – If the A matrix is singular (invalid parameter combination).