[[33]{.chapter-number}  [Spatial Optimization]{.chapter-title}]{#sec-optimization .quarto-section-identifier}

33 Spatial Optimization

Code

from geosnap import DataStore, io
from spopt.locate import MCLP, LSCP
import geopandas as gpd
import numpy as np
import pandas as pd
import osmnx as ox
import pulp
import spopt

%load_ext watermark
%watermark -a  'eli knaap' -iv -d -u

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Author: eli knaap

Last updated: 2025-05-01

geosnap  : 0.15.3.dev3+g3fa22c0.d20250409
numpy    : 1.26.4
pandas   : 2.2.3
pulp     : 2.8.0
geopandas: 1.0.1
spopt    : 0.6.1
osmnx    : 2.0.2

There frequently arise in practical affairs problems that are concerned with how to serve or supply, in an optimum fashion, a set of destinations that have fixed and known locations What must be determined, in these problems, is the number, location, and size of the sources that will, most economically, supply the given set of destinations with some commodity, material, or service Examples of such problems are the determination and location of warehouses, distribution centers, communication centers, or production facilities

— Cooper (1963), p. 331

In the last section we examined “location choice models,” where we observe a set of choices made by consumers, then try to infer the parameters of a demand function to understand how people value the different attributes inherent in each location. In other cases, we find ourselves on the supply side of the equation, where we face a different problem: we have a set of resources, such as fire hydrants, hospitals, grocery stores, or missile defense systems¹, as well as a set of constraints (diminishing water pressure, finite resources, limited budgets), and a specific task to accomplish: distribute the resources as best as possible. Hotelling (1929), Christaller (1937), and Weber (1929) were motivated by this kind of task, and the focus on supplier means this is sometimes referred to as “business site selection” (Church & Murray, 2008) (though obviously we site a lot more than businesses); in this case it is our task to make the choices (and we know our own location demand parameters), and our goal is to arrive at the optimal locations, given our particular objective and set of constraints. We call these kinds of problems spatial optimization.

In methodological terms, “spatial optimization” refers broadly to the use of linear programming (LP) and mixed integer programming (MIP) techniques to solve problems related to where a set of resources should be located (Church & Scaparra, 2007; Lei et al., 2016; Murray et al., 2012; Murray & Church, 1995; Tong & Murray, 2012; Wei & Murray, 2012). Since these are special cases of a broader class of optimization problems, one option is to simply define a model then solve it using linear programming software like PuLP. However, since many spatial optimization problems require the same types of inputs (e.g. the locations of supply and demand sites, and the travel costs associated with moving between destinations, etc.), we can also skirt formulating the optimization by hand².

The spopt library (Feng et al., 2021) provides a set of simple tools tailored to spatial optimization. We have already used spopt for regionalization in Chapter 24, and here we examine its location-allocation capabilities. Location-allocation is a spatial application of mixed integer programming (MIP) that is a hallmark of operations research (Beaumont, 1981; Cooper, 1963; Love & Juel, 1982; Ross & Soland, 1977). The goal in many cases is to identify the best location to distribute a set of resources in space, where “best” is defined explicitly and unambiguous mathematically.

where do we distribute voting locations to ensure equal travel burden?
where do we distribute retail stores to maximize demand?
how many locations (and where) do we need to ensure 95% of the population has access to fresh food within five kilometers?
what are the most cost effective locations to ensure everyone has ambulatory care or a fire station within a 15-minute response time?

Once we define these questions explicitly, we can use LP/MIP to find optimal or near-optimal solutions. Many questions can be answered by formulating them as optimization problems, where we define an objective function (like cost minimization) and a set of constraints (like total budget and/or number of facilities), then ask a mathematical solver to optimize a set of decision variables (like a set of locations) to best satisfy our objective (e.g. which locations to choose)—and if you want to answer an unlimited number of questions this way, you should dig more into optimization methods and operations research (Galichon, 2016).

Without venturing too deeply into the literature, however, a small handful of spatial optimization problems are applied routinely enough to warrant their own names and bespoke implementations in spatial analysis software, and we examine some here. Specifically, we will assume the role of a hypothetical city planner in Washington D.C. Our department is planning to hold a series of meetings to discuss potential changes to the city plan, and would like to optimize the location of those meetings. For the sake of presentation, we will assume that we have all public libraries at our disposal for holding the meetings. To begin, we need data for population in the District, as well as a pandana network and the location of libraries.

Code

# get 2020 blockgroup-level data for DC
dc = io.get_acs(DataStore(), state_fips="11", years=2020, constant_dollars=False)
# get a pedestrian travel network and get nearest nodes to each blockgroup
dcnet = io.get_network_from_gdf(dc)
dc["node_ids"] = dcnet.get_node_ids(dc.centroid.x, dc.centroid.y)
# collect anything from OSM tagged as a library
libraries = ox.features_from_polygon(dc.union_all(), tags={"amenity": "library"})
# this returns mixed geometries, so lazily down-convert to points
libraries = libraries.set_geometry(libraries.centroid)
libraries = libraries.drop_duplicates(subset=["geometry"])
# get nearest nodes in the travel network and simplify the example to single observations per node
libraries["node_ids"] = dcnet.get_node_ids(libraries.geometry.x, libraries.geometry.y)
libraries = libraries.drop_duplicates(subset=["node_ids"])
libraries = libraries.reset_index()
# plot the results
libraries.explore(tooltip="name", color='black', marker_kwds={'radius':6})

Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 81599
Setting CH edge vector of size 255170
Range graph removed 258018 edges of 510340
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%

/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/3996029376.py:5: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  dc["node_ids"] = dcnet.get_node_ids(dc.centroid.x, dc.centroid.y)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/3996029376.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  libraries = libraries.set_geometry(libraries.centroid)

Make this Notebook Trusted to load map: File -> Trust Notebook

33.1 Network Distances

We have used the pandana library extensively up to this point to generate Graphs, measure accessibility and segregation, and identify employment centers, but we have relied mainly on the aggregation functionality without bothering to care about the routing or shortest paths that power these queries behind the scenes. Before going further, though, it is useful to consider a few of the other package internals. As a data structure, a pandana.Network object is a set of two pandas dataframes (nodes_df and edges_df) that together encode a network.

Notably, these are pandas dataframes, not geopandas geodataframes because the network routing can actually abstract away geography; the algorithms only care about how edges relate to nodes via impedances, and do not actually bother considering where nodes are or how travel paths are shaped. The only geographic information pandana uses are the x and y locations of each node, because those are used to map other variables onto the network. But the information is still there to be used if we want it. For example we can turn the nodes dataframe into a geodataframe and plot it (interactively). This is every node (intersection) in the Washington D.C. pedestrian network, according to OpenStreetMap.

Code

dcnet.nodes_df = gpd.GeoDataFrame(
    dcnet.nodes_df,
    geometry=gpd.points_from_xy(dcnet.nodes_df.x, dcnet.nodes_df.y),
    crs=4326,
)
dcnet.nodes_df.explore()

Make this Notebook Trusted to load map: File -> Trust Notebook

Plotting a pandana network this way can be an very useful diagnostic to understand how the travel network is represented as a collection of points and lines. Also remember that when a user does an aggregation query using pandana, the query is performed against all of these nodes, so in many cases we can simplify the problem to get the information we need using fewer resources.

33.1.1 Shortest Paths

Although pandana Network objects are technically aspatial, the geometry information associated with each edge can be highly useful for some applications, so when collecting a network from OSM using geosnap’s io.get_network_from_gdf, the edge geometries are stored as a GeoSeries (thanks to the way osmnx queries the OSM database, this information comes along essentially for free)³. For example, we can collect the set of intersections defining the shortest path between a pair of origin and destination nodes using the shortest_path method on a Network object. With three lines, then, we can get a series of points that define the shortest path between a given census blockgroup (say,110010098073 near the Naval Research Laboratory) and a library, like one on Georgetown’s campus.

Code

origin_node = dc[dc.geoid == "110010098073"].node_ids.values[0]
destination_node = libraries.loc[[0]].node_ids.values[0]

path = dcnet.shortest_path(origin_node, destination_node)

The path variable now holds a set of OSM node ids that form a single path from origin to destination. If our objective is cost minimization, then this is the optimal route. Here are the first 20 nodes (intersections) along the path:

Code

pd.Series(path).head(20)

0     5438006812
1     5438006814
2     5438006811
3       49782687
4     5438006801
5     5438006802
6       49751887
7     5438006808
8       49822299
9       49766761
10    8204233420
11    1468892555
12      49809666
13      49830847
14      49794479
15    6801453797
16    7263641349
17      49731541
18    2999124269
19    9739791181
dtype: int64

Since we have the geometry information associated with each edge, we can use this series of nodes to reconstruct the shortest path for visualization using only a few lines. The path we want to visualize is defined as the set of edge geometries (streets) whose starting and ending nodes (intersections) belong to the shortest path series we just created, so we only need a simple selection on the edges_df.

Code

m = dcnet.edges_df[
    (dcnet.edges_df["from"].isin(path)) & (dcnet.edges_df["to"].isin(path))
].explore(color="black")
dc.set_geometry(dc.centroid)[dc.geoid == "110010098073"].explore(
    color="green", marker_kwds={"radius": 8}, m=m
)
libraries.loc[[0]].explore(color="purple", marker_kwds={"radius": 8}, m=m)

/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/2458902423.py:4: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  dc.set_geometry(dc.centroid)[dc.geoid == "110010098073"].explore(

Make this Notebook Trusted to load map: File -> Trust Notebook

The map above shows the origin node in green, the destination node in purple, and the shortest path along the pedestrian network (according to OSM) in black. Note that unlike the straight-line distance between the two points, the network path needs to ark eastward and through the bridge across the Anacostia River.

33.1.2 Cost Matrices

A single shortest-cost-path visualization is useful to have on hand, but often we want to consume network distances in another form (e.g. the Graphs and such we’ve seen). In particular, many applications require a pairwise distance matrix that quantifies the cost of travel between each pair of origins and destinations. We can generate such a travel cost matrix from a pandana Network object using the shortest_path_lengths method. To do so, we need to get a set of origin and destination nodes in the proper format, and an easy way to do that is via a pandas MultiIndex. For reference, our goal is to create a set of relationships from every origin to every destination, which should be:

Code

dc.shape[0] * libraries.shape[0]

Our goal is to create a matrix of origins (census blockgroups) and destinations (libraries), each of which are represented as nodes on the network. To generate this format we create a pandas.MultiIndex using the from_product constructor method, which gives us a Cartesian product of origins and destinations as an adjacency list format, which we will store as the variable ods.

Code

ods = pd.MultiIndex.from_product([dc['node_ids'], libraries['node_ids']])
ods

MultiIndex([(8930859709, 10117281068),
            (8930859709,  9762790801),
            (8930859709, 10176191207),
            (8930859709,  8776269506),
            (8930859709,  5445932234),
            (8930859709,   736936478),
            (8930859709,  4805694648),
            (8930859709, 12233138797),
            (8930859709, 10099231917),
            (8930859709,  5443123365),
            ...
            (3957573613,  5849771856),
            (3957573613,  2348769467),
            (3957573613,  8455341681),
            (3957573613,  3007993395),
            (3957573613,  1493945627),
            (3957573613,  6564883153),
            (3957573613,  3216173266),
            (3957573613, 12386647878),
            (3957573613,    49743669),
            (3957573613,  5445925363)],
           names=['node_ids', 'node_ids'], length=29692)

The ods multiindex creates every set of origin-destination pairs we need (and we can see it is the correct length at 29692), so we pass these these values, called level_0 and level_1 of the multiindex, to the shortest_path_lengths method, which fills in the weight (cost/impedance) values for our adjacency list. This is all the information we need, so the only remaining task is to reformat the matrix so that the rows are arranged in the same order as the input blockgroups and the columns are similarly arranged by order of the library dataframe.

Code

paths = dcnet.shortest_path_lengths(
    ods.get_level_values(0).values, ods.get_level_values(1).values
)
cost_matrix = (
    pd.Series(paths, index=ods)
    .unstack()
    .reindex(dc.node_ids)
    .T.reindex(libraries["node_ids"])
    .T.values
)

pd.DataFrame(cost_matrix)

	0	1	2	3	4	5	6	7	8	9	...	42	43	44	45	46	47	48	49	50	51
0	1725.223	1352.880	5746.707	3778.099	1077.267	7449.716	1411.444	4856.434	1719.921	5076.034	...	3977.218	3837.827	3628.510	5858.388	9817.890	8129.345	9775.747	7243.043	12230.316	11785.329
1	1837.486	2162.721	6077.442	4176.202	1925.678	6339.791	1223.147	3858.836	1743.918	5317.326	...	4617.230	4512.221	2630.912	6706.696	10650.971	8370.637	10683.308	8018.680	13063.397	12618.410
2	1271.589	1619.092	6151.630	4183.022	1382.049	7013.716	819.671	4563.759	1193.107	5480.957	...	4382.141	4266.226	3335.835	6308.215	10107.342	8534.268	10232.185	7689.538	12519.768	12074.781
3	1555.612	1181.189	6564.581	4818.199	1095.789	7765.238	1559.199	5457.339	1806.615	6269.507	...	5017.318	4791.323	4229.415	6452.394	9618.937	9322.818	10137.134	8035.435	12031.363	11586.376
4	675.429	2971.854	7709.561	5740.953	2734.811	6742.294	977.940	4790.365	377.805	7026.082	...	5940.072	5824.157	4172.234	7863.498	11460.104	10079.393	11780.857	9247.469	13872.530	13427.543
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
566	6510.329	4233.109	6618.067	5570.453	4585.158	11684.744	6489.990	9091.462	6761.332	7889.425	...	5548.958	5269.265	7863.538	5297.256	4591.211	10486.058	7662.197	7948.723	7003.637	6558.650
567	10232.052	8338.511	3969.384	5835.103	8217.492	12539.261	9813.924	10019.308	10187.175	7305.009	...	5601.016	5534.840	9329.801	3582.326	8180.108	4687.655	5955.833	1654.916	9926.293	10147.547
568	9777.351	8078.451	3347.333	5213.052	7814.959	11748.673	9272.713	9392.743	9702.738	6522.916	...	4978.965	4919.578	8707.750	3922.580	8819.494	3897.067	6597.104	936.097	10567.564	10786.933
569	10182.357	8015.695	4495.119	6360.838	8074.917	13137.768	9764.229	10545.043	10137.480	7830.744	...	6077.388	5863.797	9855.536	3240.683	7103.489	6402.116	3914.772	2633.628	7885.232	9070.928
570	5490.793	3235.151	7050.049	5573.250	3609.385	10711.897	5470.454	8118.615	5741.796	7269.887	...	5602.925	5332.717	6890.691	6062.580	6447.552	10318.600	8799.745	8520.903	8859.978	8414.991

571 rows × 52 columns

The resulting cost matrix has origins (blockgroups) on the rows and destinations (libraries) on the columns with cell values holding the cost of traveling from row to column. To visualize this data, we could plot the distance from each blockgroup to a single library (column zero) as a choropleth; since we know the cost matrix has the same order as the blockgroups, we can assign column zero to the dc dataframe to measure every blockgroup’s shortest-path network distance to library zero.

Code

m = dc.assign(dist=cost_matrix[:, 0]).explore(
    "dist", tooltip=["geoid", "dist"], style_kwds={"weight": 0.5}
)
libraries.loc[[0]].explore(m=m, tooltip="name", marker_kwds={"radius": 10}, color="red")

Make this Notebook Trusted to load map: File -> Trust Notebook

Note

In this example we are using a pedestrian network, which is undirected (same cost in both directions), but many other networks like auto or transit have impedances strongly affected by direction and time of day. Thus, here, we’re looking at the cost of consumers traveling to the supply. If we needed this to be a delivery service area measured by the cost of going from supply to demand, we could pass the reverse origins and destinations then transpose the cost matrix, which effectively measures the cost of traveling outward from supply.

33.2 Location-Allocation

The problem is then simultaneously to locate the central facilities and to determine an assignment of flows, so that the total costs of operation of the system are the least possible. This sort of problem arises in a great variety of practical planning situations. It is strongly evident, for example, in the planning of geographical systems centered upon such phenomena as hospitals, schools, warehouses, industries, and so on. In addition, location-allocation problems generalize readily into questions of taxonomic description, regionalization, and optimal spatial clustering

— Scott (1970) p.95

33.2.1 Location Set Covering Problem (LSCP)

Imagine we work for the D.C. city government and we plan to hold civic meetings at public libraries. We want to maximize representation, so every resident should have a meeting accessible within a five-kilometer walk. Which public library sites do we choose? That is, if we need to ensure every census blockgroup has a meeting accessible within a 5km walk, what is the minimum number of meetings we need to hold, and which libraries do we select? This kind of problem can be formulated as a LSCP (Church & Murray, 2018b, 2018a; García & Marín, 2019; Murray & Wei, 2013; ReVelle et al., 1976).

Instead of programming this optimization problem by hand using PuLP, the spopt package provides a simpler interface for these common problems in its locate module. One option is to build the problem directly from a geodataframe, in which case we can treat euclidean distances between observations as the travel cost; here, we already have network distances at our disposal so we create a new LSCP instance from a cost matrix and specify the maximum travel cost of five kilometers. Then, we call the solve method and pass a PuLP solver class which is used to handle the optimization behind the scenes. This flexibility is nice because it allows users to provide their own commercial solver if they have access to one. Here, I will use the open-source cbc solver (if the cbc solver is not available on windows, these examples can also be run using GLPK, which is available on conda).

Code

lscp = LSCP.from_cost_matrix(cost_matrix, 5000)
lscp = lscp.solve(pulp.COIN_CMD(msg=False, warmStart=True))
lscp.problem.objective.value()

8.0

The objective value for this problem is the minimum number of locations required to cover the set of demand locations (which in this case is 8). One important attribute of the model class is cli2fac, which is a dictionary that maps each client (blockgroup) to a list of its assigned facilities (if any). For ease of presentation we can convert this into a dataframe, which will convert empty lists into NaNs.

Code

pd.DataFrame(lscp.cli2fac).sample(10)

	0	1	2	3
107	17	35.0	44.0	NaN
537	29	51.0	NaN	NaN
299	18	48.0	NaN	NaN
327	29	51.0	NaN	NaN
44	8	44.0	NaN	NaN
48	8	44.0	NaN	NaN
477	48	NaN	NaN	NaN
278	8	18.0	NaN	NaN
429	48	NaN	NaN	NaN
323	18	29.0	51.0	NaN

The resulting dataframe has a row for each blockgroup, and the values of each column hold the index of its assigned library. By definition of the LSCP, every row will contain at least one assignment (the first column is completely filled), however some blockgroups fall inside the travel threshold of multiple libraries and are thus given multiple assignments. Since there are four columns in the cli2fac, it means there is at least one blockgroup assigned to four libraries. Using this dataframe, we know which library assignments have been made–which means we know where those eight library meetings should be held, so we can map these below.

Code

assn = pd.DataFrame(lscp.cli2fac).rename(columns={0:'idx'}).merge(libraries[['name']], left_on='idx', right_index=True, how='left')
m = dc.assign(assignment=assn['name']).explore(
    "assignment",
    categorical=True,
    tooltip="assignment",
    cmap="tab20b",
    style_kwds={"weight": 0.5},
    legend=False
)
libraries.assign(id=libraries.index).iloc[pd.Series(assn['idx'].dropna()).unique()].explore(
    m=m,
    color="black",
    tooltip=["id", "name"],
    marker_kwds={"radius": 7},
)

Make this Notebook Trusted to load map: File -> Trust Notebook

We could also consider an alternative versions of this problem via two different extensions. The first, called the capacitated location set covering problem, assumes that each facility location has a maximum capacity (e.g. our meeting room holds only so many seats). To solve this model, we would also use the LSCP class, but also include facility_capacity_arr argument along with a vector of values describing maximum capacity at each facility location. We could also consider a version of the problem where some meetings are already scheduled, so the locations are already chosen and we need to know how many additional meetings need to be scheduled. In this case, there are locations already present in the solution set, which can be defined in the class by passing the predefined_facilities_arr argument a binary array of length n_facilities, where ones indicate a location must be included. Finally, you can fit an alternative model that includes backup facilities, the Location Set Covering Problem-Backup (LCSP-B) using the appropriate class.

33.2.2 The Maximal Coverage Location Problem (MCLP)

We could also consider a different optimization problem known as the maximal coverage location problem (Berman & Krass, 2002; Chung, 1986; Church et al., 1996; Church & Velle, 2005; Pirkul & Schilling, 1991; ReVelle et al., 2008). In this case, we still want to maximize representation, but we have a different set of criteria to optimize. Imagine we only have enough staff and resources to hold four public events, and we expect people will only be willing to travel a certain threshold distance (say two kilometers) to attend. Which libraries should we select to hold our meetings to maximize availability to the public?

The MCLP can help us optimize our scarce resources to reach the maximum number of citizens, though this solution will differ obviously from the LSCP above. Whereas before, we guarantee complete coverage of all demand, and ask the solver to minimize the number of required locations, in this case, we have only four locations at our disposal, and we understand that some demand will not be met. That is, some blockgroups may not have access to a meeting site at all if locating it somewhere else would result in greater overall attendance; the question in this problem is, ‘how can we serve the most people possible (within their travel budget) by selecting only four locations from the set of libraries?’.

33.2.2.1 Service Threshold: 2km

For this exercise it can be useful to compare two potential scenarios, the first of which we assume the public has a smaller travel budget of only two kilometers, the compare the solution to a second scenario where we increase the budget. Similar to the LSCP, we instantiate the model class using the from_cost_matrix method and pass the network distances computed earlier, though this time we also pass a vector of “demand weights”, here the total population in each blockgroup, and a p_facilities parameter set to four, indicating the number of facilities we need to locate.

Code

mlcp2k = MCLP.from_cost_matrix(
    cost_matrix=cost_matrix,
    weights=dc["n_total_pop"].values,
    service_radius=2000,
    p_facilities=4,
    name="mclp-network-distance",
)
mlcp2k = mlcp2k.solve(pulp.COIN_CMD(msg=False, warmStart=True))

Once the model has solved, we can look at attributes to see how well it performed, for example we can look at the perc_cov attribute to see how much of the demand (population within the travel threshold) we have been able to satisfy (i.e. how “optimal” is the model?).

Code

mlcp2k.perc_cov

39.75481611208407

With only four locations at our disposal and only a two kilometer travel distance, we can only cover about 40% of D.C. residents (although 40% is not bad for only four locations!). As before, we can look at which libraries (facilities) were chosen, and which blockgroups were assigned to them.

Code

assn = (
    pd.DataFrame(mlcp2k.cli2fac)
    .rename(columns={0: "idx"})
    .merge(libraries[["name"]], left_on="idx", right_index=True, how="left")
)
m = dc.assign(assignment=assn['name']).explore(
    "assignment",
    categorical=True,
    cmap="tab20b",
    tooltip="assignment",
    style_kwds={"weight": 0.5},
)
libraries.assign(id=libraries.index).iloc[
    pd.Series(assn["idx"].dropna()).unique()
].explore(
    m=m,
    color="black",
    tooltip=["name", "id"],
    marker_kwds={"radius": 7},
)

Make this Notebook Trusted to load map: File -> Trust Notebook

In the map above, the selected four libraries that hold our public meetings are plotted as circles and the blockgroups they serve are plotted as distinct colors ⁴. The ‘unassigned’ portions of the city (without a meeting inside the two kilometer walk threshold) are shown in gray. Given the limited travel budget, this solution focuses on concentrating meetings in the most populous core of the city, with all four locations relatively close together, and several of the service areas overlapping, at the expense of the city’s outlying regions.

33.2.2.2 Service Threshold: 5km

As an alternative, what if we could still only afford to staff four meetings, but our travel budget has increased (or, travel costs less and we expect to draw people from a larger distance). In this case, we can change our travel threshold and solve the model again to see whether we choose the same or different locations for our meetings. To do this, we instantiate the MLCP class again using all the same arguments, except setting service_radius=5000.

Code

mlcp5k = MCLP.from_cost_matrix(
    cost_matrix=cost_matrix,
    weights=dc["n_total_pop"].values,
    service_radius=5000,
    p_facilities=4,
    name="mclp-network-distance",
)
mlcp5k = mlcp5k.solve(pulp.COIN_CMD(msg=False, warmStart=True))

Checking the perc_cov attribute again, we can compare the five kilometer solution to the two kilometer solution.

Code

mlcp5k.perc_cov

87.56567425569177

Clearly the increased travel budget has resulted in a better optimization process; with a threshold of five kilometers we can choose four locations that collectively cover about 88% of the D.C. population (which is more than twice as good as our two-kilometer solution).

Code

assn = (
    pd.DataFrame(mlcp5k.cli2fac)
    .rename(columns={0: "idx"})
    .merge(libraries[["name"]], left_on="idx", right_index=True, how="left")
)
m = dc.assign(assignment=assn['name']).explore(
    "assignment",
    categorical=True,
    cmap="tab20b",
    tooltip="assignment",
    style_kwds={"weight": 0.5},
)
libraries.assign(id=libraries.index).iloc[
    pd.Series(assn["idx"].dropna()).unique()
].explore(
    m=m,
    color="black",
    tooltip=["name", "id"],
    marker_kwds={"radius": 7},
)

Make this Notebook Trusted to load map: File -> Trust Notebook

With a larger travel budget the location of meeting location changes substantially. Whereas the original (2km) solution concentrated all four meetings into the libraries in the most populous core of the city, the five kilometer solution moves the meetings out to the edges–though notably the Petworth Library shows up in both solutions. A much greater share of the city is covered by this solution, however, it still leaves a good share of the District (22% of the population, by definition) unreachable. Further, it’s unclear how that unreachable 22% is distributed. We have not included any optimization criteria to ensure, e.g. age/race/income representation, so we do not know whether the unrepresented portion of the population is the same, on average, as the overall city population–though those could be criteria in a better optimization model!

As with LCSP, this problem can also be extended to include existing facilities by passing the appropriate array to the predefined_facilities_arr parameter.

Beaumont, J. R. (1981). Location-allocation problems in a plane a review of some models. Socio-Economic Planning Sciences, 15(5), 217–229. https://doi.org/10.1016/0038-0121(81)90042-2

Berman, O., & Krass, D. (2002). The generalized maximal covering location problem. Computers & Operations Research, 29(6), 563–581. https://doi.org/10.1016/S0305-0548(01)00079-X

Christaller, W. (1937). Die ländliche siedlungsweise im Deutschen reich und ihre beziehungen zur gemeindeorganisation. W. Kohlhammer.

Chung, C.-H. (1986). Recent Applications of the Maximal Covering Location Planning (M.C.L.P.) Model. Journal of the Operational Research Society, 37(8), 735–746. https://doi.org/10.1057/jors.1986.134

Church, R. L., & Murray, A. (2018a). Location Covering Models: History, Applications and Advancements. Springer International Publishing. https://doi.org/10.1007/978-3-319-99846-6

Church, R. L., & Murray, A. (2018b). Location Modeling and Covering Metrics. In R. L. Church & A. Murray (Eds.), Location Covering Models: History, Applications and Advancements (pp. 1–22). Springer International Publishing. https://doi.org/10.1007/978-3-319-99846-6_1

Church, R. L., & Murray, A. T. (2008). Business Site Selection, Location Analysis and GIS. John Wiley & Sons, Inc. https://doi.org/10.1002/9780470432761

Church, R. L., & Scaparra, M. P. (2007). Protecting Critical Assets: The r-Interdiction Median Problem with Fortification. Geographical Analysis, 39(2), 129–146. https://doi.org/10.1111/j.1538-4632.2007.00698.x

Church, R. L., Stoms, D. M., & Davis, F. W. (1996). Reserve selection as a maximal covering location problem. Biological Conservation, 76(2), 105–112. https://doi.org/10.1016/0006-3207(95)00102-6

Church, R. L., & Velle, C. R. (2005). The Maximal Covering Location Problem. Papers in Regional Science, 32(1), 101–118. https://doi.org/10.1111/j.1435-5597.1974.tb00902.x

Cooper, L. (1963). Location-Allocation Problems. Operations Research, 11(3), 331–343. https://doi.org/10.1287/opre.11.3.331

Feng, X., Barcelos, G., Gaboardi, J. D., Wei, R., Wolf, L. J., Zhao, Q., Rey, S. J., Knaap, E., Wei, R., Wolf, L. J., Zhao, Q., & Rey, S. J. (2021). Spopt: A python package for solving spatial optimization problems in PySAL. Journal of Open Source Software, 7(74), 1–7. https://doi.org/10.21105/joss.03330

Galichon, A. (2016). Optimal transport methods in economics. Princeton University Press.

García, S., & Marín, A. (2019). Covering Location Problems. In G. Laporte, S. Nickel, & F. Saldanha da Gama (Eds.), Location Science (pp. 99–119). Springer International Publishing. https://doi.org/10.1007/978-3-030-32177-2_5

Hotelling, H. (1929). Stability in Competition. The Economic Journal, 39(153), 41–57. https://doi.org/10.2307/2224214

Lei, T. L., Church, R. L., & Lei, Z. (2016). A unified approach for location-allocation analysis: Integrating GIS, distributed computing and spatial optimization. International Journal of Geographical Information Science, 30(3), 515–534. https://doi.org/10.1080/13658816.2015.1041959

Love, R. F., & Juel, H. (1982). Properties and Solution Methods for Large Location—Allocation Problems. Journal of the Operational Research Society, 33(5), 443–452. https://doi.org/10.1057/jors.1982.93

Murray, A. T., & Church, R. L. (1995). Heuristic solution approaches to operational forest planning problems. OR Spectrum, 17(2), 193–203. https://doi.org/10.1007/BF01719265

Murray, A. T., Gober, P., Anselin, L., Rey, S. J., Sampson, D., Padegimas, P. D., & Liu, Y. (2012). Spatial optimization models for water supply allocation. Water Resources Management, 26(8), 2243–2257. https://doi.org/10.1007/s11269-012-0013-5

Murray, A. T., & Wei, R. (2013). A computational approach for eliminating error in the solution of the location set covering problem. European Journal of Operational Research, 224(1), 52–64. https://doi.org/10.1016/j.ejor.2012.07.027

Pirkul, H., & Schilling, D. A. (1991). The Maximal Covering Location Problem with Capacities on Total Workload. Management Science, 37(2), 233–248. https://www.jstor.org/stable/2632393

ReVelle, C., Scholssberg, M., & Williams, J. (2008). Solving the maximal covering location problem with heuristic concentration. Computers & Operations Research, 35(2), 427–435. https://doi.org/10.1016/j.cor.2006.03.007

ReVelle, C., Toregas, C., & Falkson, L. (1976). Applications of the Location Set-covering Problem. Geographical Analysis, 8(1), 65–76. https://doi.org/10.1111/j.1538-4632.1976.tb00529.x

Ross, G. T., & Soland, R. M. (1977). Modeling Facility Location Problems as Generalized Assignment Problems. Management Science, 24(3), 345–357. https://doi.org/10.1287/mnsc.24.3.345

Scott, A. J. (1970). Location‐Allocation Systems: A Review. Geographical Analysis, 2(2), 95–119. https://doi.org/10.1111/j.1538-4632.1970.tb00149.x

Tong, D., & Murray, A. T. (2012). Spatial optimization in geography. Annals of the Association of American Geographers, 102(6), 1290–1309. https://doi.org/10.1080/00045608.2012.685044

Weber, A. (1929). Über den Standort der Industrien. The University of Chicago Press.

Wei, R., & Murray, A. T. (2012). An integrated approach for addressing geographic uncertainty in spatial optimization. International Journal of Geographical Information Science, 26(7), 1231–1249. https://doi.org/10.1080/13658816.2011.633918

Ok, you probably do not have a defense system to optimize, but much of the cutting edge work in operations research originated in military applications like this, where decision-makers routinely face tough decisions under a specific set of constraints.↩︎
In the interest of brevity we will proceed straight to well-known spatial optimization problems, however, optimization modeling from scratch is an extraordinarily useful skill, and Levi John Wolf has a forthcoming set of materials with an excellent and much fuller treatment.↩︎
This is why the isochrones in geosnap are more detailed when using a pandana.Network object created by the geosnap network downloader. The additional information provided by the shape of the streets (above the intersections alone) makes it easier to wrap a detailed polygon around the destination locations.↩︎
Note here we are taking the dead simple approach and plotting a single facility (library) for each client (blockgroup), but the solution includes many blockgroups which are served by multiple libraries. With a bit more effort, we could visualize these relationships similar to a Graph, but this exercise is left to the reader :).↩︎