import contextily as ctximport geopandas as gpdimport mapclassify as mcimport matplotlib.pyplot as pltimport numpy as npimport pandana as pdnaimport pandas as pdimport pydeck as pdkimport quilt3 as q3from geosnap import DataStorefrom geosnap import analyze as gazfrom geosnap import io as giofrom libpysal.graph import Graphfrom matplotlib import cmfrom matplotlib.colors import Normalizefrom tobler.util import h3fy%load_ext watermark%watermark -a 'eli knaap'-iv
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Accessibility is a recurrent theme in urban social science (El-Geneidy & Levinson, 2006; Fotheringham, 1981; Hansen, 1959; Huff, 1963; Levinson et al., 2017; Levinson & Wu, 2020; Kumar1994?). The original concept deals with access to employment and is used to help explain the size and layout of a city (Converse, 1949; Huff, 1962, 1964, 1966); this thread grew into an entire field of spatial interaction and continues to be the heart of travel demand modeling. More recently, accessibility analysis has become an important part of social equity analysis (e.g., to show whether certain population groups have better or worse access to a particular resource, a la the spatial mismatch hypothesis). This thread has been particularly strong in health geography where several new accessibility concepts have been proposed (Luo & Wang, 2003; Wang, 2012, 2021; Wang & Luo, 2005).
If we combine the intuition with Zipf (1949) with Stewart & Warntz (1958) and Stewart (1947), it amounts to a behavioral explanation for why humans behave like particles in some cases.
The “demographic gravitation” implied by various applications of the concept of potential of population serves to account in part for these facts. But a statistical countertendency is required to explain why all the people in a city do not pile up in the most dense census tract. An adequate mathematical description of this appears to be found in the concept of the “human gas.” Each individual seeks some elbowroom or living space. The idea of a two-dimensional gas is already familiar in physics (monomolecular layers). In the demographic analogue we write: pa = NT, where a is the area occupied by N individuals (molecules), p is the “demographic pressure,” and T is the “demographic temperature.” In the physical case, p is the force per unit length of boundary.
In other words, people enjoy personal space–and they are lazy, but are nonetheless drawn to collections of resources; the larger the concentration, the stronger the pull, but increasing distance attenuates that attraction. Thus, ‘accessibility’ effectively operationalizes the concept of distance decayed attraction, which is just like the gravity equation in physics.
Finally, the gravity concept is essentially an empirical notion. It tells nothing about why observed regularities occur as they do under various situations and, as a consequence, it leaves one at a loss when discrepancies occur that cannot be accounted for. Yet, it is surprising to note how often gravity models pertaining to human interaction are referred to as laws or theories. These models are nothing more than tools that allow one to make short-cut approximations of the direction and magnitude of individual travel movement
Anas (1983) later formulated a behavioral explanation for the gravity model, specifically that the doubly-constrained gravity model is equivalent to a multinomial logit model of joint origin-destination choice. This cements an actual behavioral explanation for the gravity model rooted in utility maximization. As such access to destinations will be explored as a primary driver of location choice in later sections.
The basic form of an access measure following Hansen (1959) is
\[
A_i = \sum_{j=1}^J O_j f(C_{ij})
\]
Where the accessibility measure \(A\) at location \(i\) is a sum of the opportunities \(O\) available at \(j\) locations, weighted by some function of the cost of moving from \(i\) to \(j\). Generally \(f(C_{ij})\) is an exponential decay function where cost \(C\) is measured in some form of travel impedance like network distance or travel time (Levinson & Wu, 2020). That is, \(C_{ij}\) is a special case of the spatial graph where we also care about the values located at \(i\) itself (\(C_{ii} \neq 0\))(Knaap & Rey, 2023). As we saw in Chapter 1, it is trivial to create a simple accessibility measure using a Graph.
Thus, if this looks like the spatial lag equation from Chapter 10, that’s because it is. The differences are
here we consider a unit part of its own context (i.e. we allow \(i=j\)) and
unlike the spatial lag, we do not row-standardize the cost matrix (so \(A_i\) remains a distance-weighted sum, not an average)
14.2 Job Access in San Diego
We start by gathering employment data from LODES for the San Diego metropolitan region via geosnap. The LODES data tabulate block-level counts of employees at their workplace and residential locations (also broken out by wage, education, race, sex, and industry). Since we are interested in access to jobs, we will assume each employee represents one job (which is located at the workplace area according to LODES). For convenient visualization (because blocks can often crash the browser) we will also load in the census tract boundaries.
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:215: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
warn(
Code
ax = sd_lodes.to_crs(3857).plot("total_employees", scheme="quantiles")ctx.add_basemap(ax, source=ctx.providers.CartoDB.Positron)ax.axis("off")ax.set_title("Job Counts by Block")
Text(0.5, 1.0, 'Job Counts by Block')
Because of the irregular size and shape of census blocks, it is difficult to discern any real pattern from a simple choropleth map of employment counts, but there is a strong signal laying under the surface.
14.2.1 Euclidean Distance
Code
g = Graph.build_kernel( sd_lodes.to_crs(sd_lodes.estimate_utm_crs()).centroid, bandwidth=2000, kernel="triangular",)
As we saw in Chapter 9, it is important to consider travel network impedance as a distance measure in urban spaces where Euclidean proxies often fail (Knaap & Rey, 2023). To account for that we use the pandana package (Foti et al., 2012) which implements the contraction hierarchies technique (Geisberger et al., 2012) for quickly computing shortest path distances along the network. To compute access measures based on the shortest path distance, we first need a network that represents the study area. You can create your own network on the fly from OpenStreetMap using either geosnap’s network module or osmnet. Alternatively you can use a pre-constructed network (based on OSM), e.g. those stored in the geosnap quilt bucket.
Using a pre-constructed network, we instantiate by reading from an hdf5 file.
Code
b = q3.Bucket("s3://spatial-ucr")b.fetch("osm/metro_networks_8k/41740.h5", "../data/41740.h5")
Generating contraction hierarchies with 10 threads.
Setting CH node vector of size 332554
Setting CH edge vector of size 522484
Range graph removed 143094 edges of 1044968
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
Following, we need to connect the network to the data we want to measure and visualize, which is accomplished by snapping the source data to their nearest intersections in the Network. The Network.get_node_ids function takes care of that and returns an index (representing the node ID of the network intersection) for each input observation (here, the centroid of each census block) thus mapping the job data to the travel network. It is a good idea to save these IDs as a variable on the input dataframe so we can use them as a join index
Code
nodes = sd_network.get_node_ids( sd_lodes.centroid.geometry.x, sd_lodes.centroid.geometry.y)# get the node ids for tracts also (for visualization)tract_nodes = sd_network.get_node_ids( sd_acs.centroid.geometry.x, sd_acs.centroid.geometry.y)sd_lodes["node_ids"] = nodessd_acs["node_ids"] = tract_nodes
/var/folders/79/cknfb1sx2pv16rztkpg6wzlw0000gn/T/ipykernel_13659/1286788151.py:2: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
sd_lodes.centroid.geometry.x, sd_lodes.centroid.geometry.y
/var/folders/79/cknfb1sx2pv16rztkpg6wzlw0000gn/T/ipykernel_13659/1286788151.py:7: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
sd_acs.centroid.geometry.x, sd_acs.centroid.geometry.y
With locations set, we now need to attach values to the network (i.e. the amount of “resources” located at each node in the network). The LODES data contain information on many subsets of jobs
Since we only need the columns with NAICS codes, we can filter down the codebook to show only those rows we care about (to see what the codes actually mean). The North American Industry Classification System (NAICS) is a taxonomy for identifying jobs in particular industries. As with a FIPS code, NAICS is hierarchical, with each additional digit in the code providing an additional level of detail in the industrial makeup. The two-digit classifications provide a generic overview of different industries allowing us to examine, for example whether access to manufacturing jobs differs from access to retail employment.
Code
# add some width in in the display so we can see the full descriptionpd.options.display.max_colwidth =100lodes_codebook = datasets.lodes_codebook()lodes_codebook[lodes_codebook["name"].str.contains("naics")]
name
variable
type
description
7
naics_11
CNS01
Num
Number of jobs in NAICS sector 11 (Agriculture, Forestry, Fishing and Hunting)
8
naics_21
CNS02
Num
Number of jobs in NAICS sector 21 (Mining, Quarrying, and Oil and Gas Extraction)
9
naics_22
CNS03
Num
Number of jobs in NAICS sector 22 (Utilities)
10
naics_23
CNS04
Num
Number of jobs in NAICS sector 23 (Construction)
11
naics_31_33
CNS05
Num
Number of jobs in NAICS sector 31-33 (Manufacturing)
12
naics_42
CNS06
Num
Number of jobs in NAICS sector 42 (Wholesale Trade)
13
naics_44_45
CNS07
Num
Number of jobs in NAICS sector 44-45 (Retail Trade)
14
naics_48_49
CNS08
Num
Number of jobs in NAICS sector 48-49 (Transportation and Warehousing)
15
naics_51
CNS09
Num
Number of jobs in NAICS sector 51 (Information)
16
naics_52
CNS10
Num
Number of jobs in NAICS sector 52 (Finance and Insurance)
17
naics_53
CNS11
Num
Number of jobs in NAICS sector 53 (Real Estate and Rental and Leasing)
18
naics_54
CNS12
Num
Number of jobs in NAICS sector 54 (Professional, Scientific, and Technical Services)
19
naics_55
CNS13
Num
Number of jobs in NAICS sector 55 (Management of Companies and Enterprises)
20
naics_56
CNS14
Num
Number of jobs in NAICS sector 56 (Administrative and Support and Waste Management and Remediati...
21
naics_61
CNS15
Num
Number of jobs in NAICS sector 61 (Educational Services)
22
naics_62
CNS16
Num
Number of jobs in NAICS sector 62 (Health Care and Social Assistance)
23
naics_71
CNS17
Num
Number of jobs in NAICS sector 71 (Arts, Entertainment, and Recreation)
24
naics_72
CNS18
Num
Number of jobs in NAICS sector 72 (Accommodation and Food Services)
25
naics_81
CNS19
Num
Number of jobs in NAICS sector 81 (Other Services [except Public Administration])
26
naics_92
CNS20
Num
Number of jobs in NAICS sector 92 (Public Administration)
Following we can select a few different industries and examine access to each of them and whether the patterns differ
Removed 12657 rows because they contain missing values
Removed 12657 rows because they contain missing values
Removed 12657 rows because they contain missing values
Removed 12657 rows because they contain missing values
Removed 12657 rows because they contain missing values
Accessibility measures are generated with pandana using an “aggregation query” where we define:
the variable of interest
the form of the decay function (flat, linear, exponential)
the bandwidth/radius of the decay function
the aggregation to perform (sum, average, quantile, etc. A standard access measure is based on a distance-weighted sum)
So here, we look at access to total jobs using a distance-weighted sum with a decay up to 2000 meters. We can create two separate measures based on different decay functions (exponential and linear)
Then, we repeat that process for each of the industries we would like to compare. In this case we will use an exponential decay for all measures.
Code
dfs = {}dests = ["retail", "finance", "manufacturing", "warehousing"]for dest in dests: dfs[dest] = sd_network.aggregate(2000, name=dest, decay="exp")
The resulting dataframe stores accessibility measures for each intersection in the pandana.Network, which means access is captured at an extremely high resolution (of course this is also dependent on the spatial resolution of the input data, which were Census blocks in this case, and also have very high resolution)
Code
ac = pd.DataFrame(dfs)ac
retail
finance
manufacturing
warehousing
id
17413808
0.000000
0.000000
0.000000
53.000000
17413859
0.000000
0.000000
0.000000
0.000000
28828453
88.027798
1.290641
3.441710
4.732352
48857408
70.768797
51.878775
205.081798
99.457479
48857412
105.573721
26.980106
2.812058
1.504362
...
...
...
...
...
6568745422
0.000000
0.000000
0.000000
0.000000
6568761537
0.000000
0.000000
0.000000
0.000000
6568761541
0.000000
0.000000
0.000000
0.000000
6568829948
1537.976564
70.259058
400.737876
442.950873
6568913078
0.000000
0.000000
0.000000
0.000000
332554 rows × 4 columns
Following, we can merge these variables back onto a geodataframe and visualize the results.