%load_ext watermark%watermark -a 'eli knaap'-v -d -u -p geopandas,geosnap
The watermark extension is already loaded. To reload it, use:
%reload_ext watermark
Author: eli knaap
Last updated: 2025-01-24
Python implementation: CPython
Python version : 3.12.8
IPython version : 8.31.0
geopandas: 1.0.1
geosnap : 0.14.1.dev14+g0443e2a.d20250103
11.1 The ‘Egohood’ and the 20-Minute Neighborhood
As a package focused on “neighborhoods”, much of the functionality in geosnap is organized around the concept of ‘endogenous’ neighborhoods. That is, it takes a classical perspective on neighborhood formation: a ‘neighborhood’ is defined loosely by its social composition, and the dynamics of residential mobility mean that these neighborhoods can grow, shrink, or transition entirely.
But two alternative concepts of “neighborhood” are also worth considering. The first, posited by social scientists, is that each person or household can be conceptialized as residing in its own neighborhood which extends outward from the person’s household until some threshold distance (Hipp & Boessen, 2013; Kim & Hipp, 2019). This neighborhood represents the boundary inside which we might expect some social interaction to occur with one’s ‘neighbors’ (Grannis, 1998, 2005). The second is a normative concept advocated by architects, urban designers, and planners (arguably still the goal for new urbanists): that a neighborhood is a discrete pocket of infrastructure organized as a small, self-contained economic unit (Perry, 1929). A common shorthand today is the 20 minute neighborhood.
The difference between these two perspectives is really what defines the origin of the neighborhood (the ‘town center’ or the person’s home), and whether ‘neighborhood’ is universal to all neighbors or unique for each resident. Both of them rely abstractly on the concept of an isochrone: the set of destinations accessible within a specified travel budget. Defining an isochrone is a network routing problem, but its spatial representation is usually given as a polygon that covers the set of locations. That polygon is also sometimes called a walkshed (or *travel shed, depending on the particular mode). For the person at the center of the isochrone, whose “neighborhood” the isochrone represents, the polygon is sometimes conceptualized as an “egohood” or “bespoke neighborhood”.
The trouble with generating isochrones is twofold. First, they are computationally intensive to create. You need to know the shortest path from an origin to all other possible destinations inside the travel threshold, and depending on the density of the network, this can scale masssively. Second, they are not straightforward to present cartographically/visually. The set of destinations reachable within the threshold is technically a set of points. If you’re constrained to a network, then we can usually assume that you also have access to potential locations between discrete destinations. For example if you’re walking along the sidewalk, you can stop at any house along the block until you reach your threshold distance. But sometimes that’s not the case. If you walk to a subway station, you can stop anywhere along the walk–until you get into the subway, then you are stuck traveling in one direction until you reach another station, then you get freedom of mobility again.
The latter case is particularly difficult to represent because it does not create a “valid” polygon… there’s a single polygon in the pre-transit zone, then the “accessible zone” collapses to a line (or nothing at all?) before opening back up into a polygon again. Like a barbell. If you take the simple route and just draw a convex hull around the accessible destinations, it will fail to collapse along the line, giving the false impression of much more ‘access’ than is realistic.
But then again, sometimes these off-network spaces are actually traversable. If two network nodes inside the travel threshold are separated by a park, then you can just walk through the park and that should be included in the visualization. If they are separated by a harbor or a mountain, you definitely can’t walk through (ok, maybe you could get through, but not without sacrificing a bit of speed at the very least).
To handle these issues, the isochrones implemented in geosnap take a straightforward approach, achieving a good balance between accuracy and speed. Specifically, we tackle the problem in two stages: first, we use pandana to generate the set nodes accessible from [a set of] destination[s]. Then we wrap an alpha shape around those destinations to create a tightly-fitted polygon. The alpha shape algorithm implemented in libpysal is also blazing fast, so the approach has worked quite well in all of our applications
Code
from geosnap.analyze import isochrones_from_id, isochrones_from_gdf, pdna_to_adjfrom geosnap.io import get_acsfrom geosnap import DataStoreimport pandana as pdnaimport geopandas as gpdimport pandas as pdimport osmnx as oxfrom geosnap import DataStorefrom geosnap.io import get_acs, get_network_from_gdffrom geosnap import analyze as gazfrom geosnap import io as giofrom routingpy import Valhallafrom shapely.geometry import Polygondatasets = DataStore()
To generate a routable network, use the geosnap get_network_from_gdf function. Alternatively, you can also use pandana or urbanaccess directly. In a pinch, you can also download one of the metropolitan-scale pedestrian networks for the U.S. from geosnap’s quilt bucket, however these networks are several years old and intended primaryly for pedagogical purposes. The files are named for each CBSA fips code and extend 8km beyond the metro region’s borders to help mitigate edge effects. Here, we’ll use the quilt version from the San Diego region.
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:215: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
warn(
Code
import osifnot os.path.exists('41740.h5'):import quilt3 as q3 b = q3.Bucket("s3://spatial-ucr") b.fetch("osm/metro_networks_8k/41740.h5", "./41740.h5")sd_network = pdna.Network.from_hdf5("41740.h5")
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 332554
Setting CH edge vector of size 522484
Range graph removed 143094 edges of 1044968
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
To generate a travel isochrone, we have to specify an origin node on the network. For demonstration purposes, we can randomly select an origin from the network’s nodes_df dataframe (or choose a pre-selected example). To get the nodes for a specific set of origins, you can use pandana’s get_node_ids function
Nike, er… this study says that the average person walks about a mile in 20 minutes, so we can define the 20-minute neighborhood for a given household as the 1-mile walkshed from that house. To simplify the computation a little, we say that each house “exists” at it’s nearest intersection
(this is the abstraction pandana typically uses to simplify the problem when using data from OpenStreetMap. There’s nothing prohibiting you from creating an even more realistic version with nodes for each residence, as long as you’re comfortable creating the Network)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/network.py:70: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
node_ids = network.get_node_ids(origins.centroid.x, origins.centroid.y).astype(int)
origin
destination
cost
0
060730001001
060730001001
0.000000
136
060730001001
060730001002
987.565979
197
060730001001
060730002011
1221.943970
240
060730001001
060730002022
1465.890991
280
060730001002
060730001002
0.000000
...
...
...
...
1036301
060730166072
060730166071
1596.958008
1036304
060730166073
060730166073
0.000000
1036489
060730166073
060730166071
932.747009
1036512
060730166073
060730166072
982.960022
1036608
060730166073
060730166101
1229.456055
11391 rows × 3 columns
Code
iso = isochrones_from_id(example_origin, sd_network, threshold=1600 ) # network is expressed in meters
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/network.py:70: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
node_ids = network.get_node_ids(origins.centroid.x, origins.centroid.y).astype(int)
Code
iso.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
We can also look at how the isochrone or bespoke neighborhood changes size and shape as we consider alternative travel thresholds. Because of the underlying network configuration, changing the threshold often results in some areas of the “neighborhood” changing more than others
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/network.py:70: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
node_ids = network.get_node_ids(origins.centroid.x, origins.centroid.y).astype(int)
Make this Notebook Trusted to load map: File -> Trust Notebook
In this example it’s easy to see how the road network topology makes it easier to travel in some directions more than others. Greenspace squeezes the western portion of the 1600m (20 min) isochrone into a horizontal pattern along Calle del Cerro, but the 2400m (30 minute) isochrone opens north-south tavel again along Avienda Pico, providing access to two other pockets of development, including San Clemente High School
We can also compare the network-based isochrone to the as-the-crow-flies approximation given by a euclidean buffer. If we didn’t have access to network data, this would be our best estimate of the shape and extent of the 20-minute neighborhood.
Make this Notebook Trusted to load map: File -> Trust Notebook
Obviously from this depiction, network-constrained travel is very different from a euclidean approximation. That’s especially true in places with irregular networks or topography considerations (like much of California)
11.2 Isochrones for Specified Locations
the isochrones function calculates several isochrones simultaneously, given a set of input destinations. For example we could look at the 20-minute neighborhood for schools in San Diego county.
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_20730/2995649757.py:2: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
sd_schools = schools[schools.to_crs(sd.crs).intersects(sd.unary_union)]
NCESSCH
NAME
OPSTFIPS
LSTREE
LCITY
LSTATE
LZIP
LZIP4
STFIP15
CNTY15
...
CBSATYPE15
CSA15
NMCSA15
NECTA15
NMNECTA15
CD15
SLDL15
SLDU15
geometry
year
5895
060004205341
Warner Junior/Senior High
06
30951 Highway 79
Warner Springs
CA
92086
M
06
06073
...
1
N
N
N
N
0650
071
038
POINT (-116.64292 33.27525)
1516
5896
060004206527
San Jose Valley Continuation Hig
06
30951 Highway 79
Warner Springs
CA
92086
M
06
06073
...
1
N
N
N
N
0650
071
038
POINT (-116.64292 33.27525)
1516
5897
060004206844
Warner Elementary
06
30951 Highway 79
Warner Springs
CA
92086
0008
06
06073
...
1
N
N
N
N
0650
071
038
POINT (-116.64292 33.27525)
1516
5898
060004210387
All Tribes Charter
06
34320 Valley Center Rd.
Valley Center
CA
92082
6046
06
06073
...
1
N
N
N
N
0650
075
038
POINT (-116.95367 33.27796)
1516
5899
060004212735
All Tribes Elementary Charter
06
34320 Valley Center Rd.
Valley Center
CA
92082
6046
06
06073
...
1
N
N
N
N
0650
075
038
POINT (-116.95367 33.27796)
1516
5 rows × 26 columns
Code
sd_schools.plot()
Code
# randomly sample 25 schools and compute their walkshedsschool_neighborhoods = isochrones_from_gdf(origins=sd_schools.sample(25), network=sd_network, threshold=1600,)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/network.py:248: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
node_ids = network.get_node_ids(origins.centroid.x, origins.centroid.y).astype(int)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/network.py:70: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
node_ids = network.get_node_ids(origins.centroid.x, origins.centroid.y).astype(int)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/network.py:266: UserWarning: use_edges is True, but edge geometries are not available in the Network object. Try recreating with geosnap.io.get_network_from_gdf
warn(
Code
school_neighborhoods.head()
geometry
distance
origin
13773
POLYGON ((-117.07517 32.7054, -117.07539 32.70...
1600
13729
POLYGON ((-117.10674 32.70053, -117.1061 32.70...
1600
15232
POLYGON ((-117.27103 33.17591, -117.27284 33.1...
1600
7627
POLYGON ((-116.98394 32.63043, -116.98464 32.6...
1600
7347
POLYGON ((-117.32011 33.15461, -117.31876 33.1...
1600
Code
school_neighborhoods.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook
If we adopt the “neighborhood unit” perspective, it might be reasonable to put a school at the center of the neighborhood, as it would provide equitable access to all residents. In that case, these are your neighborhoods
11.3 Service Areas for Social Facilities
One way of thinking about isochrones is considering them as service areas. That is, given some travel budget (in time, distance, transit fare, etc), the isochrone represents the service area accessible within that budget.
Imagine we were interested in the service areas around social facilities in Syracuse
Code
datasets = DataStore()# collect tracts in countytracts = get_acs(datasets, msa_fips="45060", years=[2018], level="tract")# use OSMNx to get social facilities in these tractsfacilities = ox.features.features_from_polygon( tracts.unary_union, {"amenity": "social_facility"})# keep only point geometries for illustrationfacilities = facilities[facilities.geometry.type=="Point"]# convert to a UTM projectiontracts = tracts.to_crs(tracts.estimate_utm_crs())facilities = facilities.to_crs(tracts.crs)facilities.explore(style_kwds=dict(radius=4))
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_20730/1197830926.py:7: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
tracts.unary_union, {"amenity": "social_facility"}
Make this Notebook Trusted to load map: File -> Trust Notebook
11.3.1 Service Areas
To download a routable network from scratch using osmnx, you can use the get_network_from_gdf function from geosnap (which will get a walk network by default). These networks are more accurate for generating isochrones than the pedagogical networks stored in quilt because they include the edge geometries (e.g. the roads) that help the hull algorithm fit better.
Code
walk_net = get_network_from_gdf(tracts)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/networkio.py:71: UserWarning: GeoDataFrame is stored in coordinate system EPSG:32618 so the pandana.Network will also be stored in this system
warn(
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 88536
Setting CH edge vector of size 233238
Range graph removed 242540 edges of 466476
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
11.3.1.1 Comparing Hull algorithms
To create the service area, we need to bound the set of reachable intersections using some kind of polygon. The resolution of the service area is dependent on the resolution of the network (i.e. since geosnap and pandana do not interpolate along the road network, greater intersection density will result in a more well-defined polygon). There are different bounding-polygon algorithms to choose from. The default is shapely’s concave_hull implementation, with the alpha_shape_auto algorithm from libpysal available as an alternative.
The alpha shape version of the concave hull is the most resource intensive because it tries to optimize the alpha parameter. This also makes it the slowest.
Code
m = alpha.explore()facilities.explore(m=m, color="black", style_kwds=dict(radius=3))
Make this Notebook Trusted to load map: File -> Trust Notebook
The concave hull algorithm in shapely does not do automatic optimization, but requires setting a ratio parameter, with smaller values resulting in tighter bounding polygons. If the ratio parameter is set too low, the bounding polygon can be too tight
Code
m = ch01.explore()facilities.explore(m=m, color="black", style_kwds=dict(radius=3))
Make this Notebook Trusted to load map: File -> Trust Notebook
m = ch02.explore()facilities.explore(m=m, color="black", style_kwds=dict(radius=3))
Make this Notebook Trusted to load map: File -> Trust Notebook
11.3.2 Driving Service Areas
Be very careful with driving times… There are lots of edges with no speed information (requiring us to make strong assumptions), and even at best, these represent free-flow conditions.
To get an automobile network, change network_type='drive' and add_travel_times=True
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/networkio.py:71: UserWarning: GeoDataFrame is stored in coordinate system EPSG:32618 so the pandana.Network will also be stored in this system
warn(
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 24538
Setting CH edge vector of size 65729
Range graph removed 64836 edges of 131458
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
When using travel-time based impedance, travel times are measured in seconds
# look at only the first recordm = drive_chrone.iloc[[1]].explore()facilities.iloc[[1]].explore(m=m, color="black", style_kwds=dict(radius=6))
Make this Notebook Trusted to load map: File -> Trust Notebook
Code
drive_net.edges_df.head()
from
to
travel_time
geometry
0
211905525
212550604
40.618733
LINESTRING (451007.906 4731874.375, 450991.739...
1
211905525
6291988938
166.935320
LINESTRING (451007.906 4731874.375, 450989.152...
2
212028019
212827209
55.252416
LINESTRING (426356.19 4737964.394, 426356.358 ...
3
212533433
212533484
4.363470
LINESTRING (429887.965 4753965.714, 429886.57 ...
4
212533433
212533444
12.576634
LINESTRING (429887.965 4753965.714, 429814.773...
Remember also this is directed travel. These isochrones represent the area reachable from each social service provider; they do not necesssarily represent the origins who can reach the provider within a 10 minute drive. That is, drive networks are directed (and usually asymmetric).
11.4 Comparing with Routing Servers
Here, I’ve setup an instance of the valhalla routing server on a cloud server. It’s actually pretty easy–everything is in docker, but you won’t be able to follow along unless you setup your own server and edit the URL below accordingly. Alternatively, you can try to use the public Valhalla server, but you may be rate-limited.
Often, you need routing, or travel time matrices, or isochrones for a place, but commercial or cloud-based solutions are not an option. For example if:
you have sensitive data that can’t be sent to remote servers
you have a massive dataset and end up being rate-limited (and/or hitting a paywall)
you need control over how the network is structured and parameterized (for example if you are testing different potential transit scenarios)
All of those cases require more control over the routing than web services can provide, which leaves two strong options for local computation.
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_20730/1309834641.py:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
zip(sdsub.geometry.centroid.get_coordinates().x.tolist(), sdsub.geometry.centroid.y)
11.4.1 The Valhalla Routing Engine
One is to setup a locally-hosted open-source routing engine and query that server instead of a commercial server (as you would do with a geocoding service). This is nice because they are highly performant, but the drawbacks are:
can be a little difficult to setup and maintain, or ingest new data
can take a ton of compute/storage/memory resources, so may require your own “server”
can make the work unreproducible in many contexts, because anyone running the code also needs access to the local server. We use tailscale to share compute resources, but that adds a small layer of complexity
First we setup a connection to the Valhalla server (this one is running on a server in my research lab, but you can point this anywhere). Alternatively, you could try the public Valhalla server
Code
# no trailing slash, api_key is not necessary# you need to be connected to tailscale for this to workclient = Valhalla(base_url="http://cogs-research:8002", timeout=6000)
Code
# valhalla can do one origin at a time, so only pass the firsttest_isos = client.isochrones( coords[0],"pedestrian", intervals=[2500,2000,1500,1000, ], interval_type="distance",)
Code
# turn the json response into gdfsdef gdf_from_routingpy(isos): polys = [Polygon(i['geometry']['coordinates']) for i in isos.raw['features']] gdf = gpd.GeoDataFrame(geometry=polys, crs=4326)return gdf
Make this Notebook Trusted to load map: File -> Trust Notebook
The isochrones are pretty consistent, so i think the differences between the two are:
valhalla uses a tighter alpha shape
it can do that because it interpolates nodes distance along the road network, rather than requiring you to hit an actual intersection (as we assume in the pandana/geosnap approach
11.4.2.1 Comparing 2500m
Code
m = gdf.head(1).explore(name='valhalla')gisos[gisos['distance']==2500].explore(m=m, name='geosnap', color='red')from folium import LayerControlLayerControl().add_to(m)m
Make this Notebook Trusted to load map: File -> Trust Notebook
11.5 Travel Time matrix
Code
sd.shape[0]**2
391876
Code
client.matrix?
Code
# pairwise matrix for all tract centroids in SD countycoords =list(zip(sd.geometry.centroid.get_coordinates().x.tolist(), sd.geometry.centroid.y))
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_20730/2715920804.py:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
coords = list(zip(sd.geometry.centroid.get_coordinates().x.tolist(), sd.geometry.centroid.y))
11.6 Extensions
One nice thing about this implementation is that it’s indifferent to the structure of the input network. It could be pedestrian-based (which is the most common), but you could also use urbanaccess to create a multimodel network by combining OSM and GTFS data. For this reason, the isochrone function provided here can be much more useful than commercially-provided web services like mapbox, first because all the computation happens locally (so no paywall, and no unwanted data collection) and second because the researcher has full control over how the network is structured. This allows you to do the kinds of analyses common in actual transportation modeling and planning–like adding a new transit line, a new lane in the freeway, or a new toll road, and see how the accessibility surface responds (and how the location choices of households and businesses will reallocate given this new geography).
Running the above functions on that multimodal network would give a good sense of “baseline” accessibility in a study region. An interested researcher could then make some changes to the GTFS network, say, by increasing bus frequency along a given corridor, then create another combined network and compare results. Who would benefit from such a change, and what might be the net costs? And thus, the seeds of open and reproduciblescenario planning have been sown :)
Grannis, R. (1998). The Importance of Trivial Streets: Residential Streets and Residential Segregation. American Journal of Sociology, 103(6), 1530–1564. https://doi.org/10.1086/231400
Grannis, R. (2005). T-Communities: Pedestrian Street Networks and Residential Segregation in Chicago, Los Angeles, and New York. City and Community, 4(3), 295–321. https://doi.org/10.1111/j.1540-6040.2005.00118.x
Hipp, J. R., & Boessen, A. (2013). Egohoods As Waves Washing Across The City: A New Measure Of "Neighborhoods". Criminology, 51(2), 287–327. https://doi.org/10.1111/1745-9125.12006
Kim, Y.-A., & Hipp, J. R. (2019). Street Egohood: An Alternative Perspective of Measuring Neighborhood and Spatial Patterns of Crime. In Journal of Quantitative Criminology. Springer US. https://doi.org/10.1007/s10940-019-09410-3
Perry, C. (1929). The neighborhood unit, regional plan of New York and its environs. The Urban Design Reader, 78–89.