16  Employment Centers

Code
import contextily as ctx
import geopandas as gpd
import matplotlib.pyplot as plt
import pandana as pdna
import pandas as pd
from geosnap import DataStore
from geosnap import analyze as gaz
from geosnap import io as gio
from esda.getisord import G_Local
from libpysal.graph import Graph
from libpysal.cg import alpha_shape_auto
from shapely.geometry import MultiPoint
from shapely import concave_hull

%load_ext watermark
%watermark -a 'eli knaap'  -iv
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Author: eli knaap

contextily: 1.6.2
geopandas : 1.0.1
pandas    : 2.2.3
matplotlib: 3.10.0
pandana   : 0.7
geosnap   : 0.14.1.dev14+g0443e2a.d20250103
libpysal  : 4.12.1
esda      : 2.6.0
shapely   : 2.0.6

Giuliano & Small (1991), Giuliano & A. Small (1999), Giuliano et al. (2007), Giuliano et al. (2012) Cervero & Wu (1998), Gordon & Richardson (1996), Helsley & Sullivan (1991), McDonald & McMillen (2000)

16.1 Start with an Accessibility Surface

Code
datasets = DataStore()
sd_acs = gio.get_acs(datasets, county_fips="06073", years=[2021], level="tract")
sd_lodes = gio.get_lodes(datasets, county_fips="06073", years="2021")
sd_network = pdna.Network.from_hdf5(
    "../data/41740.h5"
)
sd_network.precompute(2500)  # generate contraction hierarchies
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
  warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:215: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
  warn(
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 332554
Setting CH edge vector of size 522484
Range graph removed 143094 edges of 1044968
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%

get the intersection IDs nearest to the centroid of each census block (mapping the job data to the travel network)

save these IDs as a variable on the input dataframe so we can use them as a join index

Code
sd_lodes["node_ids"] = sd_network.get_node_ids(
    sd_lodes.centroid.geometry.x, sd_lodes.centroid.geometry.y
)

# get the node ids for tracts (for visualization
sd_acs["node_ids"] = sd_network.get_node_ids(
    sd_acs.centroid.geometry.x, sd_acs.centroid.geometry.y
)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_59080/430745399.py:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  sd_lodes.centroid.geometry.x, sd_lodes.centroid.geometry.y
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_59080/430745399.py:7: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  sd_acs.centroid.geometry.x, sd_acs.centroid.geometry.y

set the job variable onto the network

Code
sd_network.set(sd_lodes.node_ids, sd_lodes.total_employees.values, name="jobs")
Removed 12657 rows because they contain missing values

create an aggregation query (here, a distance-weighted sum with a linear decay up to 2000 meters)

Code
job_access_exp = sd_network.aggregate(2000, name="jobs", decay="exp").rename(
    "job_access"
)
sd_acs = sd_acs.merge(job_access_exp, left_on="node_ids", right_index=True)
sd_lodes = sd_lodes.merge(job_access_exp, left_on="node_ids", right_index=True)

16.2 Identifying Centers

16.2.1 An Aggregate Approach

just use zones and contiguity relationships

Code
wacs = Graph.build_contiguity(sd_acs, rook=False).assign_self_weight(1)

g_jobs = G_Local(sd_acs.job_access.values, wacs, star=True)

centers = sd_acs[["geoid", "job_access", "geometry"]].copy()
centers["pvalue"] = g_jobs.p_sim
centers["job_rank"] = centers.job_access.rank(pct=True)

# define subcenter as: total job access in top quintile, significant (G_i*) spatial cluster

subcenters = centers[(centers.pvalue < 0.05) & (centers.job_rank > 0.8)]

subcenters.explore(tiles="CartoDB Positron")
Make this Notebook Trusted to load map: File -> Trust Notebook

Use a spatial weights matrix to dissolve contiguous centers McMillen (2003)

Code
w_subcent = Graph.build_contiguity(subcenters, rook=False)
subcenters["center"] = w_subcent.component_labels
subcenters["center"] = subcenters["center"].apply(lambda x: f"center {x}")
subcenters = subcenters.dissolve("center")

subcenters.explore(tiles="CartoDB Positron")
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
Make this Notebook Trusted to load map: File -> Trust Notebook

16.2.2 A Detailed Approach

Since we’re operating on census tracts, our resulting subcenters are a function of tract size and contiguity, but the data are actually much higher resolution than that. We have computed accessibility at the intersection level, rather than the tract level, so we could instead look for significant \(G_i\ast\) clusters of these intersections instead (then wrap a polygon around them). This is the approach taken in Knaap and Rey (forthcoming)

With the new libpysal Graph, we can also use network distances to create our weights object, rather than relying on contiguity between polygons

Code
nodes_df = sd_network.nodes_df.copy()
all_intersections = gpd.GeoDataFrame(
    nodes_df, geometry=gpd.points_from_xy(nodes_df.x, nodes_df.y, crs=4326)
)
Code
all_intersections = gpd.overlay(
    all_intersections.reset_index(),
    sd_acs[sd_acs.n_total_pop > 0],
)

all_intersections = all_intersections.set_index("id")[["geometry"]].join(job_access_exp)

all_intersections.head()
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_59080/856311114.py:1: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries.
Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.

Left CRS: EPSG:4326
Right CRS: EPSG:4269

  all_intersections = gpd.overlay(
geometry job_access
id
17413808 POINT (-117.48236 33.31377) 411.692059
17413859 POINT (-117.44301 33.27226) 0.000000
28828453 POINT (-117.23325 32.76422) 982.422952
48857408 POINT (-117.19447 33.12653) 1815.748679
48857412 POINT (-117.10073 32.82399) 755.604102
Code
all_intersections.plot("job_access", scheme="quantiles", k=10, alpha=0.4)

Code
all_intersections.shape
(245061, 2)

That’s a really big problem, so let’s simplify and use the subset of nodes where blocks are attached

Code
sd_lodes = sd_lodes.to_crs(sd_lodes.estimate_utm_crs())
all_intersections = all_intersections.to_crs(all_intersections.estimate_utm_crs())
# only the nodes attached to a census block
lodes_nodes = all_intersections[all_intersections.index.isin(sd_lodes.node_ids)]

lodes_nodes.plot("job_access", scheme="quantiles", k=10, alpha=0.4)

This is still extremely detailed, but we are not including additional geometries where we have no actual data points (i.e. our resolution is only as fine as the block-level where the jobs data come from)

Now we use the nodes_in_range function to return an adjacency list of nodes within two kilometers and convert that into a libpysal.Graph

Code
w_dist = sd_network.nodes_in_range(lodes_nodes.index.unique(), 2000)
g = Graph.from_adjacency(w_dist, "source", "destination", "distance")
# ignore distance for now and treat distance-band as discrete within threshold (then row-standardize)
g = g.transform("b").assign_self_weight(1)

Now, g is a PySAL graph object that encodes observations as neighbors if they are within a 2km shortest-path distance along the travel network.

Code
g_nodes = G_Local(lodes_nodes.job_access, g, permutations=9999)

lodes_nodes["pvalue"] = g_nodes.p_sim
lodes_nodes["job_rank"] = lodes_nodes.job_access.rank(pct=True)

center_nodes = lodes_nodes[(lodes_nodes.pvalue < 0.05) & (lodes_nodes.job_rank > 0.8)]
center_nodes = center_nodes.set_crs(sd_lodes.crs)

center_nodes.explore(tiles="CartoDB Positron")
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/geopandas/geodataframe.py:1819: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)
Make this Notebook Trusted to load map: File -> Trust Notebook

Moving to the intersection scale results in a much larger employment center footprint, because there are many more observations, the surface is much more distinct, and its easier to pick out spatial outliers. The difficulty is that now our dataset is represented as a collection of points, when we want to identify discrete center polygons.

To do that, we first define a range wherein points are considered to belong to the same center (instead of setting this with a discrete threshold rule, you could alternatively use something like DBscan). Then, for each set of points, we wrap a tightly-fitted polygon that represents the “center”.

Code
# w_dist_subcenters = DistanceBand.from_dataframe(center_nodes, threshold=1000)
w_dist_subcenters = sd_network.nodes_in_range(center_nodes.index, 2000)
g_subcenters = Graph.from_adjacency(
    w_dist_subcenters, "source", "destination", "distance"
)
Code
center_nodes["center"] = g_subcenters.component_labels
center_nodes.explore("center", categorical=True, tiles="CartoDB Positron")
Make this Notebook Trusted to load map: File -> Trust Notebook
Code
shapes = [
    alpha_shape_auto(
        center_nodes[center_nodes.center == i]
        .geometry.get_coordinates()[["x", "y"]]
        .values
    )
    for i in center_nodes.center.unique()
]

node_subcenters = gpd.GeoDataFrame(geometry=shapes, crs=center_nodes.crs)

node_subcenters.reset_index()

node_subcenters = node_subcenters[~node_subcenters.geometry.is_empty]

node_subcenters.reset_index().explore(
    "index", categorical=True, tiles="CartoDB Positron"
)
Make this Notebook Trusted to load map: File -> Trust Notebook

16.2.3 Alternative Polygons

rather than an alpha_shape, we could use other concave hull methods, e.g. from shapely

Code
MultiPoint(center_nodes[center_nodes.center == 1].geometry.tolist())

Code
hulls = {}

for ratio in [.05, .1, .2]:
    hulls[f'ch {ratio}'] = concave_hull(MultiPoint(center_nodes[center_nodes.center == 1].geometry.tolist()),
    ratio,
    allow_holes=False,)
Code
hulls["ch 0.05"]

Code
hulls["ch 0.1"]

Code
hulls["ch 0.2"]

Code
hulls["alpha"] = alpha_shape_auto(
    center_nodes[center_nodes.center == 1]
    .geometry.geometry.get_coordinates()[["x", "y"]]
    .values
)
hulls["alpha"]

Code
hulls
{'ch 0.05': <POLYGON ((501308.304 3633209.555, 500856.468 3633707.787, 501307.6 3633574....>,
 'ch 0.1': <POLYGON ((501308.304 3633209.555, 500856.468 3633707.787, 501307.6 3633574....>,
 'ch 0.2': <POLYGON ((501308.304 3633209.555, 500856.468 3633707.787, 501307.6 3633574....>,
 'alpha': <POLYGON ((505681.363 3628392.019, 505677.745 3628152.689, 505658.072 362808...>}
Code
gpd.GeoSeries(hulls)
ch 0.05    POLYGON ((501308.304 3633209.555, 500856.468 3...
ch 0.1     POLYGON ((501308.304 3633209.555, 500856.468 3...
ch 0.2     POLYGON ((501308.304 3633209.555, 500856.468 3...
alpha      POLYGON ((505681.363 3628392.019, 505677.745 3...
dtype: geometry
Code
isos = gpd.GeoDataFrame(geometry=gpd.GeoSeries(hulls), crs=node_subcenters.crs)
Code
isos
geometry
ch 0.05 POLYGON ((501308.304 3633209.555, 500856.468 3...
ch 0.1 POLYGON ((501308.304 3633209.555, 500856.468 3...
ch 0.2 POLYGON ((501308.304 3633209.555, 500856.468 3...
alpha POLYGON ((505681.363 3628392.019, 505677.745 3...
Code
f,ax = plt.subplots(2,2, figsize=(12,8))
ax=ax.flatten()

for i in range(4):
    center_nodes[center_nodes.center == 1].plot(ax=ax[i])
    isos.iloc[[i]].plot(ax=ax[i], alpha=0.4)
    ax[i].axis('off')
    ax[i].set_title(f"{isos.iloc[i].name}")

Code
shapes_ch = [
    concave_hull(
        MultiPoint(center_nodes[center_nodes.center == i].geometry.tolist()),
        0.2,
        allow_holes=False,
    )
    for i in center_nodes.center.unique()
]
node_subcenters_ch = gpd.GeoDataFrame(geometry=shapes_ch, crs=center_nodes.crs)
Code
node_subcenters_ch.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

16.3 Comparison

https://opendata.sandag.org/Geographic-Information-Systems/Employment-Centers-outlines-/ixp5-dfuk/about_data

Employment Centers are identified as areas with a high concentration of employment based on SANDAG’s 2016 employment inventory from Urbansim summarized to quarter-mile radius hexagons. Local-maxima were identified as starting points, and regions were grown to include neighboring hexagons meeting a minimum employment density threshold within an approximate 2-mile radius. The resulting boundaries were generalized (taking into account major barrier features such as topography and freeways) and used to select SANDAG Master Geographic Reference Areas (MGRAs) by activity-weighted (population and employment) centroid. Employment centers are ranked by Tier based on total number of jobs:

Tier 1: 50,000 or more jobs

Tier 2: 25,000 to 49,999 jobs

Tier 3: 15,000 to 24,999 jobs

Tier 4: 2,500 to 14,999 jobs

Hexbin geography (where people live map layers) – generated using a quarter-mile radius hex bin grid overlay on the region. Data is summarized by hex bin spatially using the centroid of an overlying geography.

Where people live by Employment Center they Work In –2015 LEHD census block level data; converted to hexbin (see hexbin geography). LEHD data was used as it contains information on both the home and work location. Data represents an employee residence by the employment center they work in. https://lehd.ces.census.gov/data/.

Code
sandag = gpd.read_file("https://opendata.sandag.org/resource/ixp5-dfuk.geojson")
Code
m = sandag.explore(tiles="CartoDB Positron")
m
Make this Notebook Trusted to load map: File -> Trust Notebook

These are the official SANDAG employment centers

SANDAG employment centers in blue versus the accessibility method in red

Code
node_subcenters.reset_index().explore(color="red", m=m)
Make this Notebook Trusted to load map: File -> Trust Notebook

Although the extent of the polygons differ, the accessibility method agrees with the SANDAG employment centers for all centers in Tiers 1-3. With the exception of Carlsbad, Oceanside, and Encinitas, most of SANDAG’s tier 4 centers fail to meet either the density or significant threshold (often both)

In the map below, the access-based employment centers are represented using intersecting census block geometries

Code
m = sandag.explore(tiles="CartoDB Positron")
gpd.sjoin(sd_lodes, node_subcenters).dissolve("index_right").explore(color="red", m=m)
Make this Notebook Trusted to load map: File -> Trust Notebook

ignoring the G statistic and looking only at nodes with job access in the top quartile

Code
m = sandag.explore(tiles="CartoDB Positron")
lodes_nodes[lodes_nodes.job_rank > 0.75].explore(m=m, color="red")
Make this Notebook Trusted to load map: File -> Trust Notebook

When employment accessibility, alone, is the selection criteria (as in the map above), using nodes in the top quintile usually finds the remaining Tier 4 centers, except for some exurban places like Ramona and Otay Mesa

If you really want to be generous, just take the top half of the access distribution

Code
m = sandag.explore(tiles="CartoDB Positron")
lodes_nodes[lodes_nodes.job_rank > 0.5].explore(m=m, color="red")
Make this Notebook Trusted to load map: File -> Trust Notebook

With this simple rule, we cover all the areas defined by SANDAG, albeit with a much higher resolution

Code
 c = center_nodes.groupby('center')['geometry'].apply(lambda x:MultiPoint(x.tolist())).concave_hull(ratio=0.3)
 gpd.GeoDataFrame(geometry=c, crs=sd_lodes.crs).explore()
Make this Notebook Trusted to load map: File -> Trust Notebook