34  Spatial Optimization

Code
from geosnap import DataStore, io
from spopt.locate import MCLP, LSCP
import geopandas as gpd
import numpy as np
import pandas as pd
import osmnx as ox
import pulp
import spopt

%load_ext watermark
%watermark -a  'eli knaap' -iv -d -u
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Author: eli knaap

Last updated: 2025-05-01

geosnap  : 0.15.3.dev3+g3fa22c0.d20250409
numpy    : 1.26.4
pandas   : 2.2.3
pulp     : 2.8.0
geopandas: 1.0.1
spopt    : 0.6.1
osmnx    : 2.0.2

There frequently arise in practical affairs problems that are concerned with how to serve or supply, in an optimum fashion, a set of destinations that have fixed and known locations What must be determined, in these problems, is the number, location, and size of the sources that will, most economically, supply the given set of destinations with some commodity, material, or service Examples of such problems are the determination and location of warehouses, distribution centers, communication centers, or production facilities

Cooper (1963), p. 331

In the last section we examined “location choice models,” where we observe a set of choices made by consumers, then try to infer the parameters of a demand function to understand how people value the different attributes inherent in each location. In other cases, we find ourselves on the supply side of the equation, where we face a different problem: we have a set of resources, such as fire hydrants, hospitals, grocery stores, or missile defense systems1, as well as a set of constraints (diminishing water pressure, finite resources, limited budgets), and a specific task to accomplish: distribute the resources as best as possible. Hotelling (1929), Christaller (1937), and Weber (1929) were motivated by this kind of task, and the focus on supplier means this is sometimes referred to as “business site selection” (Church & Murray, 2008) (though obviously we site a lot more than businesses); in this case it is our task to make the choices (and we know our own location demand parameters), and our goal is to arrive at the optimal locations, given our particular objective and set of constraints. We call these kinds of problems spatial optimization.

In methodological terms, “spatial optimization” refers broadly to the use of linear programming (LP) and mixed integer programming (MIP) techniques to solve problems related to where a set of resources should be located (Church & Scaparra, 2007; Lei et al., 2016; Murray et al., 2012; Murray & Church, 1995; Tong & Murray, 2012; Wei & Murray, 2012). Since these are special cases of a broader class of optimization problems, one option is to simply define a model then solve it using linear programming software like PuLP. However, since many spatial optimization problems require the same types of inputs (e.g. the locations of supply and demand sites, and the travel costs associated with moving between destinations, etc.), we can also skirt formulating the optimization by hand2.

The spopt library (Feng et al., 2021) provides a set of simple tools tailored to spatial optimization. We have already used spopt for regionalization in Chapter 24, and here we examine its location-allocation capabilities. Location-allocation is a spatial application of mixed integer programming (MIP) that is a hallmark of operations research (Beaumont, 1981; Cooper, 1963; Love & Juel, 1982; Ross & Soland, 1977). The goal in many cases is to identify the best location to distribute a set of resources in space, where “best” is defined explicitly and unambiguous mathematically.

Once we define these questions explicitly, we can use LP/MIP to find optimal or near-optimal solutions–and if you want to answer an unlimited number of questions this way, you should dig more into optimization methods and operations research (Galichon, 2016). A huge variety of questions can be answered by formulating them as optimization problems, where we define an objective function (like cost minimization) and a set of constraints (like total budget and/or number of facilities), then ask a mathematical solver to optimize a set of decision variables (like a set of locations) to best satisfy our objective (e.g. which locations to choose).

Without digging too deeply into the literature, however, a small handful of spatial optimization problems are applied routinely enough to warrant their own names and bespoke implementations in spatial analysis software, and we examine some here. Specifically, we will assume the role of a hypothetical city planner in Washington D.C. Our department is planning to hold a series of meetings to discuss potential changes to the city plan, and would like to optimize the location of those meetings. For the sake of presentation, we will assume that they have all public libraries at their disposal for holding the meetings. To begin, we need data for population in the District, as well as a pandana network and the location of libraries.

Code
# get 2020 blockgroup-level data for DC
dc = io.get_acs(DataStore(), state_fips="11", years=2020, constant_dollars=False)
# get a pedestrian travel network and get nearest nodes to each blockgroup
dcnet = io.get_network_from_gdf(dc)
dc["node_ids"] = dcnet.get_node_ids(dc.centroid.x, dc.centroid.y)
# collect anything from OSM tagged as a library
libraries = ox.features_from_polygon(dc.union_all(), tags={"amenity": "library"})
# this returns mixed geometries, so lazily down-convert to points
libraries = libraries.set_geometry(libraries.centroid)
libraries = libraries.drop_duplicates(subset=["geometry"])
# get nearest nodes in the travel network and simplify the example to single observations per node
libraries["node_ids"] = dcnet.get_node_ids(libraries.geometry.x, libraries.geometry.y)
libraries = libraries.drop_duplicates(subset=["node_ids"])
libraries = libraries.reset_index()
# plot the results
libraries.explore(tooltip="name", color='black', marker_kwds={'radius':6})
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 81599
Setting CH edge vector of size 255170
Range graph removed 258018 edges of 510340
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/3996029376.py:5: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  dc["node_ids"] = dcnet.get_node_ids(dc.centroid.x, dc.centroid.y)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/3996029376.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  libraries = libraries.set_geometry(libraries.centroid)
Make this Notebook Trusted to load map: File -> Trust Notebook

34.1 Network Distances

We have used the pandana library extensively up to this point to generate Graphs, measure accessibility and segregation, and identify employment centers, but we have relied mainly on the aggregation functionality without bothering to care about the routing or shortest paths that power these queries behind the scenes. Before going further, though, it is useful to consider a few of the other package internals. As a data structure, a pandana.Network object is a set of two pandas dataframes (nodes_df and edges_df) that together encode a network.

Notably, these are pandas dataframes, not geopandas geodataframes because the network routing can actually abstract away geography; the algorithms only care about how edges relate to nodes via impedances, and do not actually bother considering where nodes are or how travel paths are shaped. The only geographic information pandana uses are the x and y locations of each node, because those are used to map other variables onto the network. But the information is still there to be used if we want it. For example we can turn the nodes dataframe into a geodataframe and plot it (interactively). This is every node (intersection) in the Washington D.C. pedestrian network, according to OpenStreetMap.

Code
dcnet.nodes_df = gpd.GeoDataFrame(
    dcnet.nodes_df,
    geometry=gpd.points_from_xy(dcnet.nodes_df.x, dcnet.nodes_df.y),
    crs=4326,
)
dcnet.nodes_df.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Plotting a pandana network this way can be an extremely useful diagnostic to understand how the travel network is represented as a collection of points and lines. Also remember that when a user does an aggregation query using pandana, the query is performed against all of these nodes, so in many cases we can simplify the problem to get the information we need using fewer resources.

34.1.1 Shortest Paths

Although pandana Network objects are technically aspatial, the geometry information associated with each edge can be highly useful for some applications, so when collecting a network from OSM using geosnap’s io.get_network_from_gdf, the edge geometries are stored as a GeoSeries (thanks to the way osmnx queries the OSM database, this information comes along essentially for free)3. For example, pandana can return the set of nodes that defines the shortest path between a pair of origin and destination nodes using the shortest_path method on a Network object. With three lines, then, we can get a series of network nodes that define the shortest path between a given census blockgroup (say,110010098073 near the Naval Research Laboratory) and a library like one on Georgetown’s campus.

Code
origin_node = dc[dc.geoid == "110010098073"].node_ids.values[0]
destination_node = libraries.loc[[0]].node_ids.values[0]

path = dcnet.shortest_path(origin_node, destination_node)

The path variable now holds a set of OSM node ids that form a single path from origin to destination. If our objective is cost minimization, then this is the optimal route. Here are the first 20 nodes (intersections) along the path:

Code
pd.Series(path).head(20)
0     5438006812
1     5438006814
2     5438006811
3       49782687
4     5438006801
5     5438006802
6       49751887
7     5438006808
8       49822299
9       49766761
10    8204233420
11    1468892555
12      49809666
13      49830847
14      49794479
15    6801453797
16    7263641349
17      49731541
18    2999124269
19    9739791181
dtype: int64

Since we have the geometry information associated with each edge, we can use this series of nodes to reconstruct the shortest path for visualization using only a few lines. The path we want to visualize is defined as the set of edge geometries (streets) whose starting and ending nodes (intersections) belong to the shortest path series we just created, so we only need a simple selection on the edges_df.

Code
m = dcnet.edges_df[
    (dcnet.edges_df["from"].isin(path)) & (dcnet.edges_df["to"].isin(path))
].explore(color="black")
dc.set_geometry(dc.centroid)[dc.geoid == "110010098073"].explore(
    color="green", marker_kwds={"radius": 8}, m=m
)
libraries.loc[[0]].explore(color="purple", marker_kwds={"radius": 8}, m=m)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_14399/2458902423.py:4: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  dc.set_geometry(dc.centroid)[dc.geoid == "110010098073"].explore(
Make this Notebook Trusted to load map: File -> Trust Notebook

The map above shows the origin node in green, the destination node in purple, and the shortest path along the pedestrian network (according to OSM) in black. Note that unlike the straight-line distance between the two points, the network path needs to ark eastward and through the bridge across the Anacostia River.

34.1.2 Cost Matrices

A single shortest-cost-path visualization is useful to have on hand, but often we want to consume network distances in another form (e.g. the Graphs and such we’ve seen). In particular, many applications require a pairwise distance matrix that quantifies the cost of travel between each pair of origins and destinations. We can generate such a travel cost matrix from a pandana Network object using the shortest_path_lengths method. To do so, we need to get a set of origin and destination nodes in the proper format, and an easy way to do that is via a pandas MultiIndex. For reference, our goal is to create a set of relationships from every origin to every destination, which should be:

Code
dc.shape[0] * libraries.shape[0]
29692

Our goal is to create a matrix of origins (census blockgroups) and destinations (libraries), each of which are represented as nodes on the network. To generate this format we create a pandas.MultiIndex using the from_product constructor method, which gives us a Cartesian product of origins and destinations as an adjacency list format, which we will store as the variable ods.

Code
ods = pd.MultiIndex.from_product([dc['node_ids'], libraries['node_ids']])
ods
MultiIndex([(8930859709, 10117281068),
            (8930859709,  9762790801),
            (8930859709, 10176191207),
            (8930859709,  8776269506),
            (8930859709,  5445932234),
            (8930859709,   736936478),
            (8930859709,  4805694648),
            (8930859709, 12233138797),
            (8930859709, 10099231917),
            (8930859709,  5443123365),
            ...
            (3957573613,  5849771856),
            (3957573613,  2348769467),
            (3957573613,  8455341681),
            (3957573613,  3007993395),
            (3957573613,  1493945627),
            (3957573613,  6564883153),
            (3957573613,  3216173266),
            (3957573613, 12386647878),
            (3957573613,    49743669),
            (3957573613,  5445925363)],
           names=['node_ids', 'node_ids'], length=29692)

The ods multiindex creates every set of origin-destination pairs we need (and we can see it is the correct length at 29692), so we pass these these values, called level_0 and level_1 of the multiindex, to the shortest_path_lengths method, which fills in the weight (cost/impedance) values for our adjacency list. This is all the information we need, so the only remaining task is to reformat the matrix so that the rows are arranged in the same order as the input blockgroups and the columns are similarly arranged by order of the library dataframe.

Code
paths = dcnet.shortest_path_lengths(
    ods.get_level_values(0).values, ods.get_level_values(1).values
)
cost_matrix = (
    pd.Series(paths, index=ods)
    .unstack()
    .reindex(dc.node_ids)
    .T.reindex(libraries["node_ids"])
    .T.values
)

pd.DataFrame(cost_matrix)
0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50 51
0 1725.223 1352.880 5746.707 3778.099 1077.267 7449.716 1411.444 4856.434 1719.921 5076.034 ... 3977.218 3837.827 3628.510 5858.388 9817.890 8129.345 9775.747 7243.043 12230.316 11785.329
1 1837.486 2162.721 6077.442 4176.202 1925.678 6339.791 1223.147 3858.836 1743.918 5317.326 ... 4617.230 4512.221 2630.912 6706.696 10650.971 8370.637 10683.308 8018.680 13063.397 12618.410
2 1271.589 1619.092 6151.630 4183.022 1382.049 7013.716 819.671 4563.759 1193.107 5480.957 ... 4382.141 4266.226 3335.835 6308.215 10107.342 8534.268 10232.185 7689.538 12519.768 12074.781
3 1555.612 1181.189 6564.581 4818.199 1095.789 7765.238 1559.199 5457.339 1806.615 6269.507 ... 5017.318 4791.323 4229.415 6452.394 9618.937 9322.818 10137.134 8035.435 12031.363 11586.376
4 675.429 2971.854 7709.561 5740.953 2734.811 6742.294 977.940 4790.365 377.805 7026.082 ... 5940.072 5824.157 4172.234 7863.498 11460.104 10079.393 11780.857 9247.469 13872.530 13427.543
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
566 6510.329 4233.109 6618.067 5570.453 4585.158 11684.744 6489.990 9091.462 6761.332 7889.425 ... 5548.958 5269.265 7863.538 5297.256 4591.211 10486.058 7662.197 7948.723 7003.637 6558.650
567 10232.052 8338.511 3969.384 5835.103 8217.492 12539.261 9813.924 10019.308 10187.175 7305.009 ... 5601.016 5534.840 9329.801 3582.326 8180.108 4687.655 5955.833 1654.916 9926.293 10147.547
568 9777.351 8078.451 3347.333 5213.052 7814.959 11748.673 9272.713 9392.743 9702.738 6522.916 ... 4978.965 4919.578 8707.750 3922.580 8819.494 3897.067 6597.104 936.097 10567.564 10786.933
569 10182.357 8015.695 4495.119 6360.838 8074.917 13137.768 9764.229 10545.043 10137.480 7830.744 ... 6077.388 5863.797 9855.536 3240.683 7103.489 6402.116 3914.772 2633.628 7885.232 9070.928
570 5490.793 3235.151 7050.049 5573.250 3609.385 10711.897 5470.454 8118.615 5741.796 7269.887 ... 5602.925 5332.717 6890.691 6062.580 6447.552 10318.600 8799.745 8520.903 8859.978 8414.991

571 rows × 52 columns

The resulting cost matrix has origins (blockgroups) on the rows and destinations (libraries) on the columns with cell values holding the cost of traveling from row to column. To visualize this data, we could plot the distance from each blockgroup to a single library (column zero) as a choropleth; since we know the cost matrix has the same order as the blockgroups, we can assign column zero to the dc dataframe to measure every blockgroup’s shortest-path network distance to library zero.

Code
m = dc.assign(dist=cost_matrix[:, 0]).explore(
    "dist", tooltip=["geoid", "dist"], style_kwds={"weight": 0.5}
)
libraries.loc[[0]].explore(m=m, tooltip="name", marker_kwds={"radius": 10}, color="red")
Make this Notebook Trusted to load map: File -> Trust Notebook
Note

In this example we are using a pedestrian network, which is undirected (same cost in both directions), but many other networks like auto or transit have impedances strongly affected by direction and time of day. Thus, here, we’re looking at the cost of consumers traveling to the supply. If we needed this to be a delivery service area measured by the cost of going from supply to demand, we could pass the reverse origins and destinations then transpose the cost matrix, which effectively measures the cost of traveling outward from supply.

34.2 Location-Allocation

The problem is then simultaneously to locate the central facilities and to determine an assignment of flows, so that the total costs of operation of the system are the least possible. This sort of problem arises in a great variety of practical planning situations. It is strongly evident, for example, in the planning of geographical systems centered upon such phenomena as hospitals, schools, warehouses, industries, and so on. In addition, location-allocation problems generalize readily into questions of taxonomic description, regionalization, and optimal spatial clustering

Scott (1970) p.95

34.2.1 Location Set Covering Problem (LSCP)

Imagine we work for the D.C. city government and we plan to hold civic meetings at public libraries. We want to maximize representation, so every resident should have a meeting accessible within a five-kilometer walk. Which public library sites do we choose? That is, if we need to ensure every census blockgroup has a meeting accessible within a 5km walk, what is the minimum number of meetings we need to hold, and which libraries do we select? This kind of problem can be formulated as a LSCP (Church & Murray, 2018b, 2018a; García & Marín, 2019; Murray & Wei, 2013; ReVelle et al., 1976).

Instead of programming this optimization problem by hand using PuLP, the spopt package provides a simpler interface for these common problems in its locate module. One option is to build the problem directly from a geodataframe, in which case we can treat euclidean distances between observations as the travel cost; here, we already have network distances at our disposal so we create a new LSCP instance from a cost matrix and specify the maximum travel cost of five kilometers. Then, we call the solve method and pass a PuLP solver class which is used to handle the optimization behind the scenes. This flexibility is nice because it allows users to provide their own commercial solver if they have access to one. Here, I will use the open-source cbc solver (if the cbc solver is not available on windows, these examples can also be run using GLPK, which is available on conda).

Code
lscp = LSCP.from_cost_matrix(cost_matrix, 5000)
lscp = lscp.solve(pulp.COIN_CMD(msg=False, warmStart=True))
lscp.problem.objective.value()
8.0

The objective value for this problem is the minimum number of locations required to cover the set of demand locations (which in this case is 8). One important attribute of the model class is cli2fac, which is a dictionary that maps each client (blockgroup) to a list of its assigned facilities (if any). For ease of presentation we can convert this into a dataframe, which will convert empty lists into NaNs.

Code
pd.DataFrame(lscp.cli2fac).sample(10)
0 1 2 3
107 17 35.0 44.0 NaN
537 29 51.0 NaN NaN
299 18 48.0 NaN NaN
327 29 51.0 NaN NaN
44 8 44.0 NaN NaN
48 8 44.0 NaN NaN
477 48 NaN NaN NaN
278 8 18.0 NaN NaN
429 48 NaN NaN NaN
323 18 29.0 51.0 NaN

The resulting dataframe has a row for each blockgroup, and the values of each column hold the index of its assigned library. By definition of the LSCP, every row will contain at least one assignment (the first column is completely filled), however some blockgroups fall inside the travel threshold of multiple libraries and are thus given multiple assignments. Since there are four columns in the cli2fac, it means there is at least one blockgroup assigned to four libraries. Using this dataframe, we know which library assignments have been made–which means we know where those eight library meetings should be held, so we can map these below.

Code
assn = pd.DataFrame(lscp.cli2fac).rename(columns={0:'idx'}).merge(libraries[['name']], left_on='idx', right_index=True, how='left')
m = dc.assign(assignment=assn['name']).explore(
    "assignment",
    categorical=True,
    tooltip="assignment",
    cmap="tab20b",
    style_kwds={"weight": 0.5},
    legend=False
)
libraries.assign(id=libraries.index).iloc[pd.Series(assn['idx'].dropna()).unique()].explore(
    m=m,
    color="black",
    tooltip=["id", "name"],
    marker_kwds={"radius": 7},
)
Make this Notebook Trusted to load map: File -> Trust Notebook

We could also consider an alternative versions of this problem via two different extensions. The first, called the capacitated location set covering problem, assumes that each facility location has a maximum capacity (e.g. our meeting room holds only so many seats). To solve this model, we would also use the LSCP class, but also include facility_capacity_arr argument along with a vector of values describing maximum capacity at each facility location. We could also consider a version of the problem where some meetings are already scheduled, so the locations are already chosen and we need to know how many additional meetings need to be scheduled. In this case, there are locations already present in the solution set, which can be defined in the class by passing the predefined_facilities_arr argument a binary array of length n_facilities, where ones indicate a location must be included. Finally, you can fit an alternative model that includes backup facilities, the Location Set Covering Problem-Backup (LCSP-B) using the appropriate class.

34.2.2 The Maximal Coverage Location Problem (MCLP)

We could also consider a different optimization problem known as the maximal coverage location problem (Berman & Krass, 2002; Chung, 1986; Church et al., 1996; Church & Velle, 2005; Pirkul & Schilling, 1991; ReVelle et al., 2008). In this case, we still want to maximize representation, but we have a different set of criteria to optimize. Imagine we only have enough staff and resources to hold four public events, and we expect people will only be willing to travel a certain threshold distance (say two kilometers) to attend. Which libraries should we select to hold our meetings to maximize availability to the public?

The MCLP can help us optimize our scarce resources to reach the maximum number of citizens, though this solution will differ obviously from the LSCP above. Whereas before, we guarantee complete coverage of all demand, and ask the solver to minimize the number of required locations, in this case, we have only four locations at our disposal, and we understand that some demand will not be met. That is, some blockgroups may not have access to a meeting site at all if locating it somewhere else would result in greater overall attendance; the question in this problem is, ‘how can we serve the most people possible (within their travel budget) by selecting only four locations from the set of libraries?’.

34.2.2.1 Service Threshold: 2km

For this exercise it can be useful to compare two potential scenarios, the first of which we assume the public has a smaller travel budget of only two kilometers, the compare the solution to a second scenario where we increase the budget. Similar to the LSCP, we instantiate the model class using the from_cost_matrix method and pass the network distances computed earlier, though this time we also pass a vector of “demand weights”, here the total population in each blockgroup, and a p_facilities parameter set to four, indicating the number of facilities we need to locate.

Code
mlcp2k = MCLP.from_cost_matrix(
    cost_matrix=cost_matrix,
    weights=dc["n_total_pop"].values,
    service_radius=2000,
    p_facilities=4,
    name="mclp-network-distance",
)
mlcp2k = mlcp2k.solve(pulp.COIN_CMD(msg=False, warmStart=True))

Once the model has solved, we can look at attributes to see how well it performed, for example we can look at the perc_cov attribute to see how much of the demand (population within the travel threshold) we have been able to satisfy (i.e. how “optimal” is the model?).

Code
mlcp2k.perc_cov
39.75481611208407

With only four locations at our disposal and only a two kilometer travel distance, we can only cover about 40% of D.C. residents (although 40% is not bad for only four locations!). As before, we can look at which libraries (facilities) were chosen, and which blockgroups were assigned to them.

Code
assn = (
    pd.DataFrame(mlcp2k.cli2fac)
    .rename(columns={0: "idx"})
    .merge(libraries[["name"]], left_on="idx", right_index=True, how="left")
)
m = dc.assign(assignment=assn['name']).explore(
    "assignment",
    categorical=True,
    cmap="tab20b",
    tooltip="assignment",
    style_kwds={"weight": 0.5},
)
libraries.assign(id=libraries.index).iloc[
    pd.Series(assn["idx"].dropna()).unique()
].explore(
    m=m,
    color="black",
    tooltip=["name", "id"],
    marker_kwds={"radius": 7},
)
Make this Notebook Trusted to load map: File -> Trust Notebook

In the map above, the selected four libraries that hold our public meetings are plotted as circles and the blockgroups they serve are plotted as distinct colors 4. The ‘unassigned’ portions of the city (without a meeting inside the two kilometer walk threshold) are shown in gray. Given the limited travel budget, this solution focuses on concentrating meetings in the most populous core of the city, with all four locations relatively close together, and several of the service areas overlapping, at the expense of the city’s outlying regions.

34.2.2.2 Service Threshold: 5km

As an alternative, what if we could still only afford to staff four meetings, but our travel budget has increased (or, travel costs less and we expect to draw people from a larger distance). In this case, we can change our travel threshold and solve the model again to see whether we choose the same or different locations for our meetings. To do this, we instantiate the MLCP class again using all the same arguments, except setting service_radius=5000.

Code
mlcp5k = MCLP.from_cost_matrix(
    cost_matrix=cost_matrix,
    weights=dc["n_total_pop"].values,
    service_radius=5000,
    p_facilities=4,
    name="mclp-network-distance",
)
mlcp5k = mlcp5k.solve(pulp.COIN_CMD(msg=False, warmStart=True))

Checking the perc_cov attribute again, we can compare the five kilometer solution to the two kilometer solution.

Code
mlcp5k.perc_cov
87.56567425569177

Clearly the increased travel budget has resulted in a better optimization process; with a threshold of five kilometers we can choose four locations that collectively cover about 88% of the D.C. population (which is more than twice as good as our two-kilometer solution).

Code
assn = (
    pd.DataFrame(mlcp5k.cli2fac)
    .rename(columns={0: "idx"})
    .merge(libraries[["name"]], left_on="idx", right_index=True, how="left")
)
m = dc.assign(assignment=assn['name']).explore(
    "assignment",
    categorical=True,
    cmap="tab20b",
    tooltip="assignment",
    style_kwds={"weight": 0.5},
)
libraries.assign(id=libraries.index).iloc[
    pd.Series(assn["idx"].dropna()).unique()
].explore(
    m=m,
    color="black",
    tooltip=["name", "id"],
    marker_kwds={"radius": 7},
)
Make this Notebook Trusted to load map: File -> Trust Notebook

With a larger travel budget the location of meeting location changes substantially. Whereas the original (2km) solution concentrated all four meetings into the libraries in the most populous core of the city, the five kilometer solution moves the meetings out to the edges–though notably the Petworth Library shows up in both solutions. A much greater share of the city is covered by this solution, however, it still leaves a good share of the District (22% of the population, by definition) unreachable. Further, it’s unclear how that unreachable 22% is distributed. We have not included any optimization criteria to ensure, e.g. age/race/income representation, so we do not know whether the unrepresented portion of the population is the same, on average, as the overall city population–though those could be criteria in a better optimization model!

As with LCSP, this problem can also be extended to include existing facilities by passing the appropriate array to the predefined_facilities_arr parameter.


  1. Ok, you probably do not have a defense system to optimize, but much of the cutting edge work in operations research originated in military applications like this, where decision-makers routinely face tough decisions under a specific set of constraints.↩︎

  2. In the interest of brevity we will proceed straight to well-known spatial optimization problems, however, optimization modeling from scratch is an extraordinarily useful skill, and much more thorough set of materials by Levi John Wolf are forthcoming toward those ends.↩︎

  3. This is why the isochrones in geosnap are more detailed when using a pandana.Network object created by the geosnap network downloader. The additional information provided by the shape of the streets (above the intersections alone) makes it easier to wrap a detailed polygon around the destination locations.↩︎

  4. Note here we are taking the dead simple approach and plotting a single facility (library) for each client (blockgroup), but the solution includes many blockgroups which are served by multiple libraries. With a bit more effort, we could visualize these relationships similar to a Graph, but this exercise is left to the reader :).↩︎