26  Stochastic Cellular Automata

Code
import tobler
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from geosnap import DataStore
from geosnap.analyze.dynamics import transition, predict_markov_labels, draw_sequence_from_gdf
from geosnap.io import get_acs
from geosnap.visualize import animate_timeseries, plot_transition_matrix
from IPython.display import Image
from libpysal.weights import Queen, Rook, KNN
from matplotlib.colors import to_hex
from tobler.util import h3fy
from tobler.dasymetric import extract_raster_features

%load_ext watermark
%watermark -a 'eli knaap' -v -d -u -p geopandas,geosnap
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: eli knaap

Last updated: 2025-04-08

Python implementation: CPython
Python version       : 3.12.8
IPython version      : 8.31.0

geopandas: 1.0.1
geosnap  : 0.14.1.dev14+g0443e2a.d20250103

Or, Tobler rides again: a nouveau computer movie simulating urban growth in detroit

Simulating land-use change in a Cellular Automata (CA) framework is one of the oldest traditions in urban growth modeling. Traditionally, a land-use model like SLEUTH uses a few different parameters calibrated on historical data to control the probability of a unit of land transitioning into a different kind of use (Brail, 2008; Dietzel & Clarke, 2006; Dietzel & Clarke, 2007; Guan & Clarke, 2010; Jantz et al., 2010).

Here, we’ll start with no data and build a simple CA model with a single transition rule using a spatial Markov framework. That is, the probability of a unit transitioning into different uses is a probabilistic function of its current state, and the states of the units around it. Using data from NLCD, we can observe how often these transitions occurred in the past, including how those transitions are conditioned by different spatial contexts. Then we can use that model to simulate conditions into the future.

26.1 Data

Start by grabbing tract-level data for the Detroit-ish region

Code
datasets = DataStore()
counties = ['26163', '26099', '26125']
tracts = get_acs(datasets, county_fips=counties, level="tract", years=[2018])
tracts = tracts.to_crs(tracts.estimate_utm_crs())
tracts.explore(tooltip=[])
Make this Notebook Trusted to load map: File -> Trust Notebook

Now, create a regular hexagonal grid that covers the same area and convert the land cover data from NLCD from a raster into a vector, collapsing contiguous cells into regions

Code
hexes = h3fy(tracts.to_crs(4326), resolution=9, clip=True)
lulc_01 = extract_raster_features(
    hexes,
    "/Users/knaaptime/Dropbox/projects/geosnap_data/nlcd/landcover/nlcd_landcover_2001.tif",
)
lulc_01["value"] = lulc_01["value"].astype(int).astype(str)
Code
codes = legend = np.array(
    [0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 51, 52, 71, 72, 73, 74, 81, 82, 90, 95]
).astype(str)
name = np.array(
    [
        np.nan,
        "Open Water",
        "Perennial Ice/Snow",
        "Developed, Open Space",
        "Developed, Low Intensity",
        "Developed, Medium Intensity",
        "Developed High Intensity",
        "Barren Land (Rock/Sand/Clay)",
        "Deciduous Forest",
        "Evergreen Forest",
        "Mixed Forest",
        "Dwarf Scrub",
        "Shrub/Scrub",
        "Grassland/Herbaceous",
        "Sedge/Herbaceous",
        "Lichens",
        "Moss",
        "Pasture/Hay",
        "Cultivated Crops",
        "Woody Wetlands",
        "Emergent Herbaceous Wetlands",
    ]
)
Code
lu_mapper = dict(zip(codes, name))

Now aggregate the pixels into the (coarser) hexgrid and include a column of colors to match the conventions in land use mapping

Code
def img_to_hex(hexes, img_path, year):

    hexes = hexes.copy()
    # take the LULC image and convert from raster to vector
    lulc = extract_raster_features(hexes, img_path)
    # convert to a string for easier representation as categorical variable
    lulc["value"] = lulc["value"].astype(int).astype(str)
    # pixel value 255 is NA
    lulc["value"] = lulc["value"].replace({"255": np.nan})
    temp = gpd.sjoin(lulc.to_crs(hexes.crs), hexes)
    temp = (
        temp.groupby("hex_id")["value"]
        .agg(lambda x: pd.Series.mode(x)[0])
        .astype(str)
    )  # cheat and take the min when multiple modes are returned...
    hexes = hexes.join(temp)
    hexes["year"] = year
    hexes["value"] = hexes["value"].replace(lu_mapper).replace({"nan": np.nan})
    hexes = hexes.dropna()

    return hexes
Code
nlcd_colors = pd.read_csv('nlcd_colors.csv',)
Code
# convert rgb to 0-1 range
nlcd_colors.color = nlcd_colors.color.apply(lambda x: np.array(x.split(",")).astype(float) / 255)

# convert rgb to hex (easier to hold a single value in a column instead of a list
nlcd_colors.color = nlcd_colors.color.apply(lambda x: to_hex(x))
Code
nlcd_colors
code color description
0 11 #466b9f Open Water
1 12 #d1def8 Perennial Snow/Ice
2 21 #dec5c5 Developed, Open Space
3 22 #d99282 Developed, Low Intensity
4 23 #eb0000 Developed, Medium Intensity
5 24 #ab0000 Developed High Intensity
6 31 #b3ac9f Barren Land (Rock/Sand/Clay)
7 41 #68ab5f Deciduous Forest
8 42 #1c5f2c Evergreen Forest
9 43 #b5c58f Mixed Forest
10 52 #ccb879 Shrub/Scrub
11 71 #dfdfc2 Grassland/Herbaceous
12 81 #dcd939 Pasture/Hay
13 82 #ab6c28 Cultivated Crops
14 90 #b8d9eb Woody Wetlands
15 95 #6c9fb8 Emergent Herbaceous Wetlands
Code
color_mapper = dict(zip(nlcd_colors.description.values.tolist(), nlcd_colors.color.values.tolist()))
Code
lulc_hex_01 = img_to_hex(
    hexes,
    "https://spatial-ucr.s3.amazonaws.com/nlcd/landcover/nlcd_landcover_2001.tif",
    2001,
)
Code
lulc_hex_01['color'] = lulc_hex_01.value.replace(color_mapper)
Code
nlcd_years = [2001, 2004, 2006, 2008, 2011, 2013, 2016]

lulc_hexes = pd.concat(img_to_hex(
    hexes, f"/Users/knaaptime/Dropbox/projects/geosnap_data/nlcd/landcover/nlcd_landcover_{year}.tif",
    year) for year in nlcd_years)
lulc_hexes = lulc_hexes.reset_index()

lulc_hexes = lulc_hexes[lulc_hexes.value!='Shrub/Scrub']
lulc_hexes = lulc_hexes[lulc_hexes.value!='Open Water']

lulc_hexes['color'] = lulc_hexes.value.replace(color_mapper)
Code
animate_timeseries(
    lulc_hexes,
    filename="lulc_dynamics.gif",
    figsize=(12, 12),
    fps=2,
    dpi=100,
    color_col='color',
)
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 960x960 with 0 Axes>

These are the empirical transitions encoded in the NLCD data between 2001 and 2016, represented at by a slightly larger hexgrid

Code
Image('lulc_dynamics.gif', embed=True, retina=True)
<IPython.core.display.Image object>

Observed Transitions

Observed Transitions

26.2 Modeling Land-Use Change

Using geosnap and PySAL, we can look at these transitions more formally, modeling the change over time as a spatial Markov process. Using this framework, we can ask whether certain transitions are more likely than others given their spatial context. For example, is a piece of undeveloped land more likely to become developed if it’s neighbors have already been developed?

Code
sm = transition(lulc_hexes, unit_index='hex_id', cluster_col='value')
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/dynamics.py:123: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w = Ws[w_type].from_dataframe(gpd.GeoDataFrame(gdf_wide), **w_options)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/contiguity.py:61: UserWarning: The weights matrix is not fully connected: 
 There are 6 disconnected components.
 There are 2 islands with ids: 16374, 38660.
  W.__init__(self, neighbors, ids=ids, **kw)
('WARNING: ', 16374, ' is an island (no neighbors)')
('WARNING: ', 38660, ' is an island (no neighbors)')
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
Code
type(sm)
giddy.markov.Spatial_Markov

Inference from the spatial Markov object shows the transitions are not random, but show different rates depending on the “neighborhood” of each observation

Code
sm.Q_p_value
0.0
Code
sm.LR_p_value
0.0

Plotting all the transition matrices can be daunting, but it provides a detailed look at how transition rates vary by context

Code
plot_transition_matrix(transition_model=sm, figsize=(30,30), dpi=200)
array([<Axes: title={'center': 'Global'}>,
       <Axes: title={'center': 'Modal Neighbor - Barren Land (Rock/Sand/Clay)'}>,
       <Axes: title={'center': 'Modal Neighbor - Cultivated Crops'}>,
       <Axes: title={'center': 'Modal Neighbor - Deciduous Forest'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed High Intensity'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed, Low Intensity'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed, Medium Intensity'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed, Open Space'}>,
       <Axes: title={'center': 'Modal Neighbor - Emergent Herbaceous Wetlands'}>,
       <Axes: title={'center': 'Modal Neighbor - Evergreen Forest'}>,
       <Axes: title={'center': 'Modal Neighbor - Grassland/Herbaceous'}>,
       <Axes: title={'center': 'Modal Neighbor - Mixed Forest'}>,
       <Axes: title={'center': 'Modal Neighbor - Pasture/Hay'}>,
       <Axes: title={'center': 'Modal Neighbor - Woody Wetlands'}>,
       <Axes: >, <Axes: >], dtype=object)

We can see that barren land, for example, is much more likely to become developed when it’s neighbors are low-intensity developed (from top-left, see the second row, second column)

26.3 Simulating Future Land-Use

With these transition probabilities in hand (and the set of land uses in the last time period), we have all we need to simulate potential land uses in the future. First, we assume that the empirical transitions observed in prior time periods will remain static, and that only the prior time period influences the probability of transition between landuse types. Then, we can use the spatial context of each unit to select the proper transition matrix, and draw a new land-use type using the transition probabilities observed in the past.

Repeating this process for each unit in the study area gives a stochastic realization of land-use transitions; repeating this process for a range of years lets us look at a potential evolution of the region into the future. We could then change transition probabilities or manually edit land-uses in certain places (e.g. to simulate processes like urban development, sea-level rise, or a protective land-use policy) and see how those changes propagate through the system into the future.

Code
w = Queen.from_dataframe(lulc_hexes[lulc_hexes.year==2016].dropna().reset_index())
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_82155/4138579200.py:1: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w = Queen.from_dataframe(lulc_hexes[lulc_hexes.year==2016].dropna().reset_index())
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/contiguity.py:347: UserWarning: The weights matrix is not fully connected: 
 There are 5 disconnected components.
 There is 1 island with id: 34415.
  W.__init__(self, neighbors, ids=ids, **kw)
Code
temp_res = future_lulc.value.copy().values
Code
all(temp_res == future_lulc.value.values)
True
Code
future_lulc['temp'] = temp_res
Code
future_lulc[future_lulc.value!= temp_res][future_lulc[future_lulc.value!= temp_res].year==2019]

future_lulc
index value year geometry temp
0 284778 Emergent Herbaceous Wetlands 2016 POLYGON ((-83.18994 42.03666, -83.18952 42.034... Emergent Herbaceous Wetlands
1 284779 Emergent Herbaceous Wetlands 2016 POLYGON ((-83.19079 42.04019, -83.18841 42.039... Emergent Herbaceous Wetlands
2 284780 Emergent Herbaceous Wetlands 2016 POLYGON ((-83.19553 42.04149, -83.19358 42.042... Emergent Herbaceous Wetlands
3 284781 Woody Wetlands 2016 POLYGON ((-83.18926 42.04307, -83.18688 42.042... Woody Wetlands
4 284782 Developed, Low Intensity 2016 POLYGON ((-83.20418 42.04056, -83.20223 42.041... Developed, Low Intensity
... ... ... ... ... ...
47483 332261 Developed, Open Space 2091 POLYGON ((-82.79925 42.8939, -82.79884 42.8921... Developed, Open Space
47484 332262 Pasture/Hay 2091 POLYGON ((-82.81532 42.89498, -82.81491 42.893... Pasture/Hay
47485 332263 Deciduous Forest 2091 POLYGON ((-82.81091 42.89543, -82.81049 42.893... Deciduous Forest
47486 332264 Developed, Low Intensity 2091 POLYGON ((-82.80608 42.8941, -82.80808 42.8929... Developed, Low Intensity
47487 332265 Developed, Low Intensity 2091 POLYGON ((-82.80167 42.89456, -82.80367 42.893... Developed, Low Intensity

1234688 rows × 5 columns

Code
future_lulc['color'] = future_lulc.value.replace(color_mapper)

future_lulc.head()
index value year geometry temp color
0 284778 Emergent Herbaceous Wetlands 2016 POLYGON ((-83.18994 42.03666, -83.18952 42.034... Emergent Herbaceous Wetlands #6c9fb8
1 284779 Emergent Herbaceous Wetlands 2016 POLYGON ((-83.19079 42.04019, -83.18841 42.039... Emergent Herbaceous Wetlands #6c9fb8
2 284780 Emergent Herbaceous Wetlands 2016 POLYGON ((-83.19553 42.04149, -83.19358 42.042... Emergent Herbaceous Wetlands #6c9fb8
3 284781 Woody Wetlands 2016 POLYGON ((-83.18926 42.04307, -83.18688 42.042... Woody Wetlands #b8d9eb
4 284782 Developed, Low Intensity 2016 POLYGON ((-83.20418 42.04056, -83.20223 42.041... Developed, Low Intensity #d99282
Code
animate_timeseries(
    future_lulc,
    filename="predicted_LULC.gif",
    categorical=False,
    title="Simulated Land-Use",
    figsize=(12, 12),
    color_col='color',
    fps=3,
    dpi=150,
)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/visualize/mapping.py:374: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
  fig, ax = plt.subplots(figsize=figsize)
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 1152x1152 with 0 Axes>
<Figure size 960x960 with 0 Axes>

Now we can plot the simulated land-use moving into the future. This model is pretty naive, but the results seem plausible, with development growing outward from existing cores. We also get lots of infill (but the model isnt smart enough to know some places can’t be developed, etc). It would be fairly easy to include some additional restrictions that some observations are fixed because of policy or topology

Simulated Land Use

Simulated Land Use
Code
Image("predicted_LULC.gif", embed=True, retina=True)
<IPython.core.display.Image object>

We can also quickly compute the share of land in each type and compare two time periods. Since the output is a regular grid, we just need to count the number of cells in each category without needing to actually look at the area

26.3.0.1 2019

Code
future_lulc[future_lulc.year==2019].value.value_counts(normalize=True) 
value
Developed, Low Intensity        0.408061
Developed, Open Space           0.190322
Developed, Medium Intensity     0.155008
Deciduous Forest                0.103710
Woody Wetlands                  0.034472
Pasture/Hay                     0.033482
Mixed Forest                    0.024996
Cultivated Crops                0.021058
Emergent Herbaceous Wetlands    0.009666
Developed High Intensity        0.008528
Grassland/Herbaceous            0.005749
Barren Land (Rock/Sand/Clay)    0.003580
Evergreen Forest                0.001369
Name: proportion, dtype: float64

26.3.0.2 2091

Code
future_lulc[future_lulc.year==2091].value.value_counts(normalize=True) 
value
Developed, Low Intensity        0.405366
Developed, Medium Intensity     0.204978
Developed, Open Space           0.191080
Deciduous Forest                0.079241
Pasture/Hay                     0.027165
Woody Wetlands                  0.026407
Mixed Forest                    0.022258
Developed High Intensity        0.015225
Cultivated Crops                0.013435
Emergent Herbaceous Wetlands    0.007118
Grassland/Herbaceous            0.004380
Barren Land (Rock/Sand/Clay)    0.002127
Evergreen Forest                0.001221
Name: proportion, dtype: float64

If we take the future minus the past, we can look at relative change

Code
future_lulc[future_lulc.year==2091].value.value_counts(normalize=True) -  future_lulc[future_lulc.year==2019].value.value_counts(normalize=True) 
value
Barren Land (Rock/Sand/Clay)   -0.001453
Cultivated Crops               -0.007623
Deciduous Forest               -0.024469
Developed High Intensity        0.006696
Developed, Low Intensity       -0.002695
Developed, Medium Intensity     0.049971
Developed, Open Space           0.000758
Emergent Herbaceous Wetlands   -0.002548
Evergreen Forest               -0.000147
Grassland/Herbaceous           -0.001369
Mixed Forest                   -0.002738
Pasture/Hay                    -0.006317
Woody Wetlands                 -0.008065
Name: proportion, dtype: float64

This would suggest a continued trend toward urbanization, with particularly large relative losses in deciduous forest. Low-intensity development is decreasing, but medium-intensity seems to be growing strongly. Following this analysis, some natural questions that arise include, what would it mean (in social, economic, environmental, etc.) terms to have a 5% increase in medium-intensity development and a 2.5% reduction in deciduous forest? What are the implications for climate change, anticipated housing growth, traffic congestion, change in tax revenue, and so on? And how might these conditions differ under a different scenario, such as one where we have alternative land regulations or urban growth constraints?

These questions nudge us further in the direction of simulation modeling and scenario planning, which are exceedingly useful for medium and large-scale urban planning exercises (Chakraborty et al., 2011; Chakraborty & McMillan, 2015; Knaap et al., 2020; Avin2020?; avin2007using?).