bayespecon.graph.flow_design_matrix¶
-
bayespecon.graph.flow_design_matrix(X, col_names=
None, dist=None, log_distance=True)[source]¶ Build a flow regression design matrix from regional attribute data.
Constructs the standard LeSage-Fischer O-D design matrix with separate destination, origin, and intra-zonal blocks, plus an optional distance column, following LeSage and Pace [2008] (Section 4.2).
- Parameters:¶
- X : np.ndarray, shape (n, k)¶
Array of k regional attributes for n spatial units. Should not include an intercept column.
- col_names : list[str], optional¶
Names for the k columns of X. Defaults to
["x0", "x1", ...].- dist : np.ndarray, shape (n, n), optional¶
Distance or cost matrix. If provided,
vec(dist)is appended as the last column of combined.- log_distance : bool, default True¶
If True and
distis provided, the appended column islog(1 + dist).ravel()and is named"log_distance". If False, the raw distance vector is appended and named"dist". Usinglog(1 + d)matches the gravity-model convention while keeping the diagonal at zero.
- Returns:¶
Dataclass with all sub-matrices and a combined design matrix.
- Return type:¶
Notes
The full beta vector is structured as:
\[\beta = [\alpha,\; \alpha_i,\; \beta_d^1 \ldots \beta_d^k,\; \beta_o^1 \ldots \beta_o^k,\; \beta_i^1 \ldots \beta_i^k [,\; \gamma]]\]matching
f2_sarfm.mfrom the LeSage spatial flows toolbox.The leading intercept column is always added because flow models are typically estimated on log-flow outcomes whose grand mean is informative; omitting it would force the destination/origin/intra blocks to absorb the global level and complicates effects decomposition. Users wishing to suppress the intercept should drop the first column of
combinedand the corresponding row/column of any prior covariance.Examples
Build a flow design from a 3-region attribute matrix with population and income:
>>> import numpy as np >>> X = np.array([[100.0, 50.0], # region 0: pop=100, inc=50 ... [200.0, 75.0], # region 1: pop=200, inc=75 ... [150.0, 60.0]]) # region 2: pop=150, inc=60 >>> design = flow_design_matrix(X, col_names=["pop", "inc"]) >>> design.combined.shape # 3*3 = 9 OD pairs, 1+1+2+2+2 = 8 cols (9, 8) >>> design.feature_names[:4] ['intercept', 'intra_indicator', 'dest_pop', 'dest_inc']Optionally append a vectorised distance matrix as a final column:
>>> dist = np.array([[0.0, 10.0, 5.0], ... [10.0, 0.0, 8.0], ... [5.0, 8.0, 0.0]]) >>> design_d = flow_design_matrix(X, col_names=["pop", "inc"], dist=dist) >>> design_d.feature_names[-1] 'dist'