bayespecon._ops.KroneckerFlowSolveOp¶
- class bayespecon._ops.KroneckerFlowSolveOp(W, n)[source]¶
Differentiable Kronecker-factored solve for separable Poisson flow models.
Computes \(\eta = A(\rho_d, \rho_o)^{-1} b\) where the system matrix exploits the separability constraint \(\rho_w = -\rho_d \rho_o\):
\[A(\rho_d, \rho_o) = I_N - \rho_d (I_n \otimes W) - \rho_o (W \otimes I_n) + \rho_d \rho_o (W \otimes W) = (I_n - \rho_o W) \otimes (I_n - \rho_d W) = L_o \otimes L_d\]where \(N = n^2\), \(L_k = I_n - \rho_k W\), \(W_d = I_n \otimes W\), and \(W_o = W \otimes I_n\). Note the order: the left Kronecker factor \(L_o\) is associated with \(\rho_o\) (origin effect) and the right factor \(L_d\) with \(\rho_d\) (destination effect).
Algorithm¶
Via the vec-permutation identity \((A \otimes B)\operatorname{vec}(X) = \operatorname{vec}(B X A^\top)\), the solve \((L_o \otimes L_d)\eta = b\) is equivalent to \(L_d H L_o^\top = B\) where \(B = \operatorname{mat}(b) \in \mathbb{R}^{n \times n}\) uses column-major (Fortran) ordering. This is solved in two steps:
\(H' = L_d^{-1} B\) — sparse solve with \(n\) RHS columns.
\(Z = L_o^{-\top} H'^{\,\top}\) — second sparse solve (\(Z = H_\eta^\top\)).
\(\eta = \operatorname{vec}(Z^\top)\).
Complexity¶
Each gradient evaluation requires 4 \(n \times n\) sparse factorisations (2 forward + 2 adjoint) plus 2 dense \(n \times n\) matrix products — all \(O(n^3)\).
Compare to
SparseFlowSolveOpwhich requires 2 \(N \times N\) (\(N = n^2\)) factorisations — \(O(n^6)\).Speedup at representative sizes:
n
N = n²
Approx. speedup
10
100
×100
50
2 500
×15 000
100
10 000
×250 000
Gradient derivation¶
For a scalar loss \(L\), implicit differentiation of \((L_o \otimes L_d)\eta = b\) and the formula \(dL/d\rho_k = -v^\top (\partial A/\partial \rho_k) \eta\) give:
\[\frac{\partial A}{\partial \rho_d} = L_o \otimes (-W), \qquad \frac{\partial A}{\partial \rho_o} = (-W) \otimes L_d\]where \(v = A^{-\top} g\) is the adjoint solution and \(g = \partial L / \partial \eta\). Using the vec-permutation identity with \(H_x = \operatorname{mat}(x)\) (column-major reshape):
\[\frac{\partial L}{\partial \rho_d} = \operatorname{tr}\!\left(H_v^\top W H_\eta L_o^\top\right) = \sum_{ij}(H_v)_{ij}\,(W H_\eta L_o^\top)_{ij}\]\[\frac{\partial L}{\partial \rho_o} = \operatorname{tr}\!\left(H_v^\top L_d H_\eta W^\top\right) = \sum_{ij}(H_v)_{ij}\,(L_d H_\eta W^\top)_{ij}\]See
_KroneckerFlowVJPOpfor the implementation.- param W:
Row-standardised spatial weight matrix on the n spatial units. Only this \(n \times n\) matrix is stored; the \(N \times N\) Kronecker matrices are never allocated.
- type W:
scipy.sparse.csr_matrix, shape (n, n)
- param n:
Number of spatial units.
- type n:
int
Notes
rho_wis not an input to this Op. The caller declaresrho_w = pm.Deterministic("rho_w", -rho_d * rho_o)for trace reporting; the Op implicitly uses the factorised form.Methods
L_op(inputs, outputs, output_grads)Compute VJPs via the Kronecker adjoint method.
R_op(inputs, eval_points)Construct a graph for the R-operator.
__init__(W, n)add_tag_trace(thing[, user_line])Add tag.trace to a node or variable.
do_constant_folding(fgraph, node)Determine whether or not constant folding should be performed for the given node.
grad(inputs, output_grads)Construct a graph for the gradient with respect to each input variable.
infer_shape(fgraph, node, input_shapes)inplace_on_inputs(allowed_inplace_inputs)Try to return a version of self that tries to inplace in as many as allowed_inplace_inputs.
make_node(rho_d, rho_o, b)Construct an Apply node that represent the application of this operation to the given inputs.
make_py_thunk(node, storage_map, ...[, debug])Make a Python thunk.
make_thunk(node, storage_map, compute_map, ...)Create a thunk.
perform(node, inputs, outputs)Calculate the function on the inputs and put the variables in the output storage.
prepare_node(node, storage_map, compute_map, ...)Make any special modifications that the Op needs before doing
Op.make_thunk().Attributes
An
intthat specifies which outputOp.__call__()should return.A
dictthat maps output indices to the input indices upon which they operate in-place.A
dictthat maps output indices to the input indices of which they are a view.- L_op(inputs, outputs, output_grads)[source]¶
Compute VJPs via the Kronecker adjoint method.
Delegates to
_KroneckerFlowVJPOp.
- R_op(inputs, eval_points)[source]¶
Construct a graph for the R-operator.
This method is primarily used by Rop.
-
static add_tag_trace(thing, user_line=
None)[source]¶ Add tag.trace to a node or variable.
The argument is returned after being affected (inplace).
- Parameters:¶
Notes
We also use config.traceback__limit for the maximum number of stack level we look.
-
default_output =
None[source]¶ An
intthat specifies which outputOp.__call__()should return. IfNone, then all outputs are returned.A subclass should not change this class variable, but instead override it with a subclass variable or an instance variable.
-
destroy_map =
{}[source]¶ A
dictthat maps output indices to the input indices upon which they operate in-place.Examples
destroy_map = {0: [1]} # first output operates in-place on second input destroy_map = {1: [0]} # second output operates in-place on first input
- do_constant_folding(fgraph, node)[source]¶
Determine whether or not constant folding should be performed for the given node.
This allows each Op to determine if it wants to be constant folded when all its inputs are constant. This allows it to choose where it puts its memory/speed trade-off. Also, it could make things faster as constants can’t be used for in-place operations (see
*IncSubtensor).
- grad(inputs, output_grads)[source]¶
Construct a graph for the gradient with respect to each input variable.
Each returned Variable represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of type NullType for that input.
Using the reverse-mode AD characterization given in [1], for a \(C = f(A, B)\) representing the function implemented by the Op and its two arguments \(A\) and \(B\), given by the Variables in inputs, the values returned by Op.grad represent the quantities \(\bar{A} \equiv \frac{\partial S_O}{A}\) and \(\bar{B}\), for some scalar output term \(S_O\) of \(C\) in
\[\operatorname{Tr}\left(\bar{C}^\top dC\right) = \operatorname{Tr}\left(\bar{A}^\top dA\right) + \operatorname{Tr}\left(\bar{B}^\top dB\right)\]References
- infer_shape(fgraph, node, input_shapes)[source]¶
- inplace_on_inputs(allowed_inplace_inputs)[source]¶
Try to return a version of self that tries to inplace in as many as allowed_inplace_inputs.
- make_node(rho_d, rho_o, b)[source]¶
Construct an Apply node that represent the application of this operation to the given inputs.
This must be implemented by sub-classes.
-
make_py_thunk(node, storage_map, compute_map, no_recycling, debug=
False)[source]¶ Make a Python thunk.
Like
Op.make_thunk()but only makes Python thunks.
-
make_thunk(node, storage_map, compute_map, no_recycling, impl=
None)[source]¶ Create a thunk.
This function must return a thunk, that is a zero-arguments function that encapsulates the computation to be performed by this op on the arguments of the node.
- Parameters:¶
- node¶
Something previously returned by
Op.make_node().- storage_map¶
A
dictmapping Variables to single-element lists where a computed value for each Variable may be found.- compute_map¶
A
dictmapping Variables to single-element lists where a boolean value can be found. The boolean indicates whether the Variable’s storage_map container contains a valid value (i.e.True) or whether it has not been computed yet (i.e.False).- no_recycling¶
List of Variables for which it is forbidden to reuse memory allocated by a previous call.
- impl : str¶
Description for the type of node created (e.g.
"c","py", etc.)
Notes
If the thunk consults the storage_map on every call, it is safe for it to ignore the no_recycling argument, because elements of the no_recycling list will have a value of
Nonein the storage_map. If the thunk can potentially cache return values (like CLinker does), then it must not do so for variables in the no_recycling list.Op.prepare_node()is always called. If it tries'c'and it fails, then it tries'py', andOp.prepare_node()will be called twice.
- perform(node, inputs, outputs)[source]¶
Calculate the function on the inputs and put the variables in the output storage.
- Parameters:¶
- node¶
The symbolic Apply node that represents this computation.
- inputs¶
Immutable sequence of non-symbolic/numeric inputs. These are the values of each Variable in
node.inputs.- output_storage
List of mutable single-element lists (do not change the length of these lists). Each sub-list corresponds to value of each Variable in
node.outputs. The primary purpose of this method is to set the values of these sub-lists.
Notes
The output_storage list might contain data. If an element of output_storage is not
None, it has to be of the right type, for instance, for a TensorVariable, it has to be a NumPyndarraywith the right number of dimensions and the correct dtype. Its shape and stride pattern can be arbitrary. It is not guaranteed that such pre-set values were produced by a previous call to thisOp.perform(); they could’ve been allocated by another Op’s perform method. An Op is free to reuse output_storage as it sees fit, or to discard it and allocate new memory.
- prepare_node(node, storage_map, compute_map, impl)[source]¶
Make any special modifications that the Op needs before doing
Op.make_thunk().This can modify the node inplace and should return nothing.
It can be called multiple time with different impl values.
Warning
It is the Op’s responsibility to not re-prepare the node when it isn’t good to do so.