from geosnap import DataStore, iofrom spopt.locate import MCLP, LSCPimport geopandas as gpdimport numpy as npimport pandas as pdimport osmnx as oximport pulpimport spopt%load_ext watermark%watermark -a 'eli knaap'-iv -d -u
The watermark extension is already loaded. To reload it, use:
%reload_ext watermark
Author: eli knaap
Last updated: 2025-05-01
geosnap : 0.15.3.dev3+g3fa22c0.d20250409
numpy : 1.26.4
pandas : 2.2.3
pulp : 2.8.0
geopandas: 1.0.1
spopt : 0.6.1
osmnx : 2.0.2
There frequently arise in practical affairs problems that are concerned with how to serve or supply, in an optimum fashion, a set of destinations that have fixed and known locations What must be determined, in these problems, is the number, location, and size of the sources that will, most economically, supply the given set of destinations with some commodity, material, or service Examples of such problems are the determination and location of warehouses, distribution centers, communication centers, or production facilities
In the last section we examined “location choice models,” where we observe a set of choices made by consumers, then try to infer the parameters of a demand function to understand how people value the different attributes inherent in each location. In other cases, we find ourselves on the supply side of the equation, where we face a different problem: we have a set of resources, such as fire hydrants, hospitals, grocery stores, or missile defense systems1, as well as a set of constraints (diminishing water pressure, finite resources, limited budgets), and a specific task to accomplish: distribute the resources as best as possible. Hotelling (1929), Christaller (1937), and Weber (1929) were motivated by this kind of task, and the focus on supplier means this is sometimes referred to as “business site selection” (Church & Murray, 2008) (though obviously we site a lot more than businesses); in this case it is our task to make the choices (and we know our own location demand parameters), and our goal is to arrive at the optimal locations, given our particular objective and set of constraints. We call these kinds of problems spatial optimization.
In methodological terms, “spatial optimization” refers broadly to the use of linear programming (LP) and mixed integer programming (MIP) techniques to solve problems related to where a set of resources should be located (Church & Scaparra, 2007; Lei et al., 2016; Murray et al., 2012; Murray & Church, 1995; Tong & Murray, 2012; Wei & Murray, 2012). Since these are special cases of a broader class of optimization problems, one option is to simply define a model then solve it using linear programming software like PuLP. However, since many spatial optimization problems require the same types of inputs (e.g. the locations of supply and demand sites, and the travel costs associated with moving between destinations, etc.), we can also skirt formulating the optimization by hand2.
The spopt library (Feng et al., 2021) provides a set of simple tools tailored to spatial optimization. We have already used spopt for regionalization in Chapter 24, and here we examine its location-allocation capabilities. Location-allocation is a spatial application of mixed integer programming (MIP) that is a hallmark of operations research (Beaumont, 1981; Cooper, 1963; Love & Juel, 1982; Ross & Soland, 1977). The goal in many cases is to identify the best location to distribute a set of resources in space, where “best” is defined explicitly and unambiguous mathematically.
where do we distribute voting locations to ensure equal travel burden?
where do we distribute retail stores to maximize demand?
how many locations (and where) do we need to ensure 95% of the population has access to fresh food within five kilometers?
what are the most cost effective locations to ensure everyone has ambulatory care or a fire station within a 15-minute response time?
Once we define these questions explicitly, we can use LP/MIP to find optimal or near-optimal solutions–and if you want to answer an unlimited number of questions this way, you should dig more into optimization methods and operations research (Galichon, 2016). A huge variety of questions can be answered by formulating them as optimization problems, where we define an objective function (like cost minimization) and a set of constraints (like total budget and/or number of facilities), then ask a mathematical solver to optimize a set of decision variables (like a set of locations) to best satisfy our objective (e.g. which locations to choose).
Without digging too deeply into the literature, however, a small handful of spatial optimization problems are applied routinely enough to warrant their own names and bespoke implementations in spatial analysis software, and we examine some here. Specifically, we will assume the role of a hypothetical city planner in Washington D.C. Our department is planning to hold a series of meetings to discuss potential changes to the city plan, and would like to optimize the location of those meetings. For the sake of presentation, we will assume that they have all public libraries at their disposal for holding the meetings. To begin, we need data for population in the District, as well as a pandana network and the location of libraries.
Code
# get 2020 blockgroup-level data for DCdc = io.get_acs(DataStore(), state_fips="11", years=2020, constant_dollars=False)# get a pedestrian travel network and get nearest nodes to each blockgroupdcnet = io.get_network_from_gdf(dc)dc["node_ids"] = dcnet.get_node_ids(dc.centroid.x, dc.centroid.y)# collect anything from OSM tagged as a librarylibraries = ox.features_from_polygon(dc.union_all(), tags={"amenity": "library"})# this returns mixed geometries, so lazily down-convert to pointslibraries = libraries.set_geometry(libraries.centroid)libraries = libraries.drop_duplicates(subset=["geometry"])# get nearest nodes in the travel network and simplify the example to single observations per nodelibraries["node_ids"] = dcnet.get_node_ids(libraries.geometry.x, libraries.geometry.y)libraries = libraries.drop_duplicates(subset=["node_ids"])libraries = libraries.reset_index()# plot the resultslibraries.explore(tooltip="name", color='black', marker_kwds={'radius':6})
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 81599
Setting CH edge vector of size 255170
Range graph removed 258018 edges of 510340
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/3996029376.py:5: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
dc["node_ids"] = dcnet.get_node_ids(dc.centroid.x, dc.centroid.y)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/3996029376.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
libraries = libraries.set_geometry(libraries.centroid)
Make this Notebook Trusted to load map: File -> Trust Notebook
34.1 Network Distances
We have used the pandana library extensively up to this point to generate Graphs, measure accessibility and segregation, and identify employment centers, but we have relied mainly on the aggregation functionality without bothering to care about the routing or shortest paths that power these queries behind the scenes. Before going further, though, it is useful to consider a few of the other package internals. As a data structure, a pandana.Network object is a set of two pandas dataframes (nodes_df and edges_df) that together encode a network.
Notably, these are pandas dataframes, not geopandas geodataframes because the network routing can actually abstract away geography; the algorithms only care about how edges relate to nodes via impedances, and do not actually bother considering where nodes are or how travel paths are shaped. The only geographic information pandana uses are the x and y locations of each node, because those are used to map other variables onto the network. But the information is still there to be used if we want it. For example we can turn the nodes dataframe into a geodataframe and plot it (interactively). This is every node (intersection) in the Washington D.C. pedestrian network, according to OpenStreetMap.