bayespecon.graph.flow_design_matrix

bayespecon.graph.flow_design_matrix(X, col_names=None, dist=None, log_distance=True)[source]

Build a flow regression design matrix from regional attribute data.

Constructs the standard LeSage-Fischer O-D design matrix with separate destination, origin, and intra-zonal blocks, plus an optional distance column, following LeSage and Pace [2008] (Section 4.2).

Parameters:
X : np.ndarray, shape (n, k)

Array of k regional attributes for n spatial units. Should not include an intercept column.

col_names : list[str], optional

Names for the k columns of X. Defaults to ["x0", "x1", ...].

dist : np.ndarray, shape (n, n), optional

Distance or cost matrix. If provided, vec(dist) is appended as the last column of combined.

log_distance : bool, default True

If True and dist is provided, the appended column is log(1 + dist).ravel() and is named "log_distance". If False, the raw distance vector is appended and named "dist". Using log(1 + d) matches the gravity-model convention while keeping the diagonal at zero.

Returns:

Dataclass with all sub-matrices and a combined design matrix.

Return type:

FlowDesignMatrix

Notes

The full beta vector is structured as:

\[\beta = [\alpha,\; \alpha_i,\; \beta_d^1 \ldots \beta_d^k,\; \beta_o^1 \ldots \beta_o^k,\; \beta_i^1 \ldots \beta_i^k [,\; \gamma]]\]

matching f2_sarfm.m from the LeSage spatial flows toolbox.

The leading intercept column is always added because flow models are typically estimated on log-flow outcomes whose grand mean is informative; omitting it would force the destination/origin/intra blocks to absorb the global level and complicates effects decomposition. Users wishing to suppress the intercept should drop the first column of combined and the corresponding row/column of any prior covariance.

Examples

Build a flow design from a 3-region attribute matrix with population and income:

>>> import numpy as np
>>> X = np.array([[100.0, 50.0],     # region 0: pop=100, inc=50
...               [200.0, 75.0],     # region 1: pop=200, inc=75
...               [150.0, 60.0]])    # region 2: pop=150, inc=60
>>> design = flow_design_matrix(X, col_names=["pop", "inc"])
>>> design.combined.shape  # 3*3 = 9 OD pairs, 1+1+2+2+2 = 8 cols
(9, 8)
>>> design.feature_names[:4]
['intercept', 'intra_indicator', 'dest_pop', 'dest_inc']

Optionally append a vectorised distance matrix as a final column:

>>> dist = np.array([[0.0, 10.0, 5.0],
...                  [10.0, 0.0, 8.0],
...                  [5.0, 8.0, 0.0]])
>>> design_d = flow_design_matrix(X, col_names=["pop", "inc"], dist=dist)
>>> design_d.feature_names[-1]
'dist'