the upcoming release of the PySAL metapackage is the first to include the new Graph class, our next-gen (ugh) data structure for encoding spatial relationships
plotting a Graph object is straightforward, and the interactive version works nicely for reasonably-sized graphs.
but large graphs, e.g. a Graph representing commute flows are stressful for the browser, especially considering that a good O-D travel visualization needs to be (at least) regional in scale.
lonboard works great for large graphs (though I haven’t quite perfected it). To show the whole graph, remove your pointer from the map window entirely. Sometimes it is easier to start zoomed into a node and carefully zoom out
See a much more complex example of daily commute flows in the Washington DC region here
(yes, I always use DC as my first example. It’s the place I lived the longest; easist to ground-truth because i have a reasonable handle on the economic and political geography)
this isnt anywhere close to production-ready, but given the flexibility of Graph, it’s still very fast to go from zero to viz:
Plotting the Graph
The new libpysal Graph class has a builtin method for interactively visualizing spatial interaction graphs. This works great, even for relatively large graphs–as long as the contiguity structure is sparse.
from libpysal.graph import Graphdf = gpd.read_file(geodatasets.get_path("geoda cincinnati")).to_crs(4326)
g = Graph.build_contiguity(df)g.explore(df, tiles='CartoDB Positron')
Make this Notebook Trusted to load map: File -> Trust Notebook
When each observation has many connections, the visualization gets messy really quickly (see, for example, the three-kilometer graph here). In these cases, it can be useful to leverage another dimension (like arcing the edges connecting observations in 3D), or by using another interactivity method, like mousing over an observation to reveal its neighbors (as opposed to revealing the entire graph all at once). Both of those things are tough or impossible to do with traditional web mapping tools (thank you leaflet!(!) …you’re just getting old, like the rest of us 😕).
So what if we wanted to visualize a much larger and more complicated Graph (without crashing the browser)? For example a Graph that visualizes transport flows, but at a regional scale?
Using DeckGL as a backend can solve some of the issues above (if you’re an R person, check out rdeck or Kyle Walker’s webgl package). In the past I have done that with PyDeck, deck.gl’s own python API. But the PyDeck interface to DeckGL is not actually all that scalable (when working with geodataframes anyway) because it resorts to storing data in GeoJSON which is wickedly inefficient. Lately I’ve been really impressed with lonboard, which uses DeckGL as a backend but passes all of its data via arrow, which results in a nice-to-use API and lovely scalable maps rendered by the GPU (sans GeoJSON bloat). Exploring their docs yesterday, the county-level migration example struck me as extremely similar to my example with LODES–and thus should be pretty easy to replicate with a Graph.
So I decided to kick the tires with lonboard a bit yesterday, and this was pleasantly easier than I anticipated. The Graph class works great for this and props to Kyle Barron for lonboard. Very impressive.
Proof of Concept
adapting from lonboard’s county migration example, it should be straightforward to use the arclayer as a 3D vizualization for the Graph
First create a Graph indexed by geometries so we can visualize point to point
g = Graph.build_contiguity(df.assign(idx=df.representative_point()).set_index("idx"))
Here’s a neat trick with a Graph… it will use the geodataframe index to identify observations. So if we set the geometry column as the index, we get a point-to-point adjacency list.
Two aligned arrays of XY coords, origin and destination:
import pyarrow as patable = pa.table({"value": g.adjacency.values})
Then we create an arc layer (which basically just creates edges between our two sets of aligned nodes), and add the brushing extension so only the subgraph for the observation being moused-over shows up
Turn source and target points into their own layers (this is following the example from lonboard, we might just be able to pass the original set of centroids once. The scatterplot layers dont care about order or linking, they just draw the OD points)
# the map has trouble rendering inside a jupyter widget when published via quarto#gmap.as_html()
That works!
See the full-sized map herenote: take you cursor off the map entirely to show all the links
(option + click to change the orientation).
You have to mouseover the centroids not the polygons, so it can be hard to find the node you want. Also not sure if we need layers for both source and target if its ultimately the same point set. Brushing scale factor is hard to set blindly
Flow Graph
That works, so lets do something more intense
from geosnap import DataStorefrom geosnap import io as gio
datasets = DataStore()
defs = datasets.msa_definitions()
First lets get tract boundaries and LODES data for the DC region. That’s a pain because LODES is provided in CSV format by state and the DC region spans four states. It’s provided at the block level too, which is very high resolution, so this is a lot of data to read in once then cut down
# need the main and aux tables. Main includes only intra-state flows.# aux includes flows into the state from other states.ods = []for state in ["md", "dc", "va", "wv"]: ods.append( pd.read_csv(f"https://lehd.ces.census.gov/data/lodes/LODES7/{state}/od/{state}_od_main_JT00_2019.csv.gz" ) ) ods.append( pd.read_csv(f"https://lehd.ces.census.gov/data/lodes/LODES7/{state}/od/{state}_od_aux_JT00_2019.csv.gz" ) )
trim this down to tracts for ease of presentation. I think lonboard can handle the size of the block-level OD matrix, but there are so many arcs the map becomes indecipherable
# S000 is now flows at the tract leveldfs = dfs.groupby(["w_tract", "h_tract"])["S000"].sum().reset_index()
Now we keep only flows that begin and end inside the DC region (alternatively we could add a single node that represents “external flows” or something), then pivot that into a sparse matrix and instantiate a Graph (indexed by the tract IDs)
# only keep flows inside the regiondfs = dfs[ (dfs.w_tract.astype(str).str[:5].isin(stcos))& (dfs.h_tract.astype(str).str[:5].isin(stcos))]
<Graph of 1361 nodes and 498263 nonzero edges indexed by
['11001000100', '11001000201', '11001000202', '11001000300', '1100100040...]>
flow g represents the daily commutes between blocks. The weight holds the count of commuters. Now we need to replace the focal and neighbor IDs with their geometries
adj = flow_g.adjacency.reset_index()
Now we swap the tract geoids for their geometries and recreate the Graph object (getting us back to the point-to-point matrix representation)
---------------------------------------------------------------------------NameError Traceback (most recent call last)
Cell In[1], line 1----> 1flow_g.adjacency.sum()
NameError: name 'flow_g' is not defined
flow_g.pct_nonzero
26.899495281865292
27% nonzero is a very dense graph
flow_g.adjacency
focal neighbor
POINT (-77.05916 38.90522) POINT (-77.05916 38.90522) 131.0
POINT (-77.07334 38.90902) 2.0
POINT (-77.06668 38.90913) 49.0
POINT (-77.07558 38.91751) 85.0
POINT (-77.06624 38.92443) 1.0
...
POINT (-77.72566 38.30944) POINT (-77.61238 38.25595) 2.0
POINT (-77.87730 38.45920) 2.0
POINT (-77.72566 38.30944) 6.0
POINT (-76.46551 38.51178) POINT (-76.46551 38.51178) 0.0
POINT (-77.29484 38.43915) POINT (-77.29484 38.43915) 0.0
Name: weight, Length: 498265, dtype: float64
Now we just follow the same steps above to create a vizualization of the Graph
The difference here is we have weighted edges in this Graph, so we will color the arcs proportional to the level of flow between O and D (or i and j if you prefer)
/Users/knaaptime/Dropbox/projects/mapclassify/mapclassify/classifiers.py:1653: UserWarning: Not enough unique values in array to form 8 classes. Setting k to 6.
self.bins = quantile(y, k=k)
# color and width should scale to reflect volume of the flowarc_layer = ArcLayer( table=table, get_source_position=origins, get_target_position=destinations, get_source_color=edge_colors, get_target_color=edge_colors, get_width=table["value"], opacity=0.2, pickable=True, extensions=[brushing_extension], brushing_radius=brushing_radius,)
/Users/knaaptime/mambaforge/envs/geosnap/lib/python3.11/site-packages/lonboard/_geoarrow/ops/reproject.py:97: UserWarning: Input being reprojected to EPSG:4326 CRS
warnings.warn("Input being reprojected to EPSG:4326 CRS")
/Users/knaaptime/mambaforge/envs/geosnap/lib/python3.11/site-packages/lonboard/_geoarrow/ops/reproject.py:97: UserWarning: Input being reprojected to EPSG:4326 CRS
warnings.warn("Input being reprojected to EPSG:4326 CRS")
/Users/knaaptime/mambaforge/envs/geosnap/lib/python3.11/site-packages/lonboard/_geoarrow/ops/reproject.py:97: UserWarning: Input being reprojected to EPSG:4326 CRS
warnings.warn("Input being reprojected to EPSG:4326 CRS")
/Users/knaaptime/mambaforge/envs/geosnap/lib/python3.11/site-packages/lonboard/_geoarrow/ops/reproject.py:97: UserWarning: Input being reprojected to EPSG:4326 CRS
warnings.warn("Input being reprojected to EPSG:4326 CRS")
This map shows flows into each tract (though, like, this commute goes both directions each day. This shows the morning commute). …wait, that’s not the right way to say that either… My point is the mouseover is keyed on workplace, so we’re looking at imports to job centers, not exodus from population centers.
The dot shows where you need to mouseover to show the arcs for that observation. It can be hard to nail down the exact spot
# again, the map has trouble loading inside a cell when published#gmap
There we go.
See the full sized map here (again, remove your cursor from the entire map to show all the links)
Tract-level flows in the DC region with color and size scaled by commute volume (…assuming every paid of O-D generates a single trip…)
The map is a lot to take in. Here are a few screenshots of places:
Fredericksburg, VA
Fredericksburg is an old and pretty self-contained place. It gets large flows from nearby, and a bunch of little flows from relatively close. Basically nothing from Maryland except a smattering from inner Prince George’s County
Alexandria, VA
Alexandria, interestingly, draws lots of little trips from all over the place (hence all the little yellow lines and no thick ones), though they’re all fairly local
College Park, MD
College Park (go Terps) pulls really heavily from MD, especially nearby in College Park, but has enough gravity to pull a lot from Frederick and even a decent amount from VA
Capitol Hill
The largest employment center in the region, downtown DC pulls large flows from everywhere (this is that node in the harbor)
Frederick, MD
Similar to Fredericksburg on the other side of the metro, downtown Frederick Maryland is pretty self contained but also gets lots of small trips from both VA and northern Montgomery and PG counties
Bethesda, MD
Home to NIH and several other large employers, Bethesda pulls from all over the region as well, but particularly dstrongly from the I-270 corridor (unsurprisingly)
(…i have no idea if this is the tract with NIH in it–probably not. But that’s part of the agglomeration economy that makes Bethesda pop out as an employment center)
Conclusion
There’s still a lot of work to do if we want to encapsulate this into something like a re-useable method (ArcLayer isn’t even stable over in lonboard)–but the bones are there! More importantly, this was really easy to do with the Graph. Although we designed it for exactly this kind of flexibility, it’s great to see that intention pan out in practice. I hope this also helps demonstrate how useful the spatial Graph is, as a general data structure. This used to be a \(W\), that was not used terribly widely, outside of spatial econometrics (or the odd Moran’s I computation), so as part of developing the new Graph class, we’ve tried to demonstrate the utility of a first-class object that encodes observed or hypothesized spatial interactions e.g. to study things like segregation measurement, economic development, neighborhood change, or urban morphology among many others.
And now, visualizing O-D travel matrices in three dimensions :) (obviously you dont need a Graph to do that, but I sure found it an easy and efficient way to get the data into the proper structure)
All of the data collection and computation in this notebook is pretty quick, all things told. It would be pretty simple to wrap this up into an interface usign something like streamlit so you could look at flows in any metro in the country by selecting from a dropdown…