26  Stochastic Cellular Automata

Code
import tobler
import contextily as ctx
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from geosnap import DataStore
from geosnap.analyze.dynamics import transition, predict_markov_labels, draw_sequence_from_gdf
from geosnap.io import get_acs
from geosnap.visualize import animate_timeseries, plot_transition_matrix
from IPython.display import Image
from libpysal.weights import Queen, Rook, KNN
from matplotlib.colors import to_hex
from tobler.util import h3fy
from tobler.dasymetric import extract_raster_features

%load_ext watermark
%watermark -a 'eli knaap' -v -d -u -p geopandas,geosnap
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: eli knaap

Last updated: 2025-11-24

Python implementation: CPython
Python version       : 3.12.12
IPython version      : 9.7.0

geopandas: 1.1.1
geosnap  : 0.15.3

Or, Tobler rides again: a nouveau computer movie simulating urban growth in detroit

Simulating land-use change in a Cellular Automata (CA) framework is one of the oldest traditions in urban growth modeling. Traditionally, a land-use model like SLEUTH uses a few different parameters calibrated on historical data to control the probability of a unit of land transitioning into a different kind of use (Brail, 2008; Dietzel & Clarke, 2006; Dietzel & Clarke, 2007; Guan & Clarke, 2010; Jantz et al., 2010).

Here, we’ll start with no data and build a simple CA model with a single transition rule using a spatial Markov framework. That is, the probability of a unit transitioning into different uses is a probabilistic function of its current state, and the states of the units around it. Using data from NLCD, we can observe how often these transitions occurred in the past, including how those transitions are conditioned by different spatial contexts. Then we can use that model to simulate conditions into the future.

26.1 Data

Start by grabbing tract-level data for the Detroit-ish region

Code
datasets = DataStore()
counties = ['26163', '26099', '26125']
tracts = get_acs(datasets, county_fips=counties, level="tract", years=[2018])
tracts = tracts.to_crs(tracts.estimate_utm_crs())
tracts.explore(tooltip=[])
Make this Notebook Trusted to load map: File -> Trust Notebook

Now, create a regular hexagonal grid that covers the same area and convert the land cover data from NLCD from a raster into a vector, collapsing contiguous cells into regions. Then, we create a mapping between the pixel values to their descriptive names, for example pixel value 21 corresponds to “Developed, Open Space”.

Code
hexes = h3fy(tracts.to_crs(4326), resolution=9, clip=True)
lulc_01 = extract_raster_features(
    hexes,
    "/Users/knaaptime/Dropbox/projects/geosnap_data/nlcd/landcover/nlcd_landcover_2001.tif",
)
lulc_01["value"] = lulc_01["value"].astype(int).astype(str)

codes = legend = np.array(
    [0, 11, 12, 21, 22, 23, 24, 31, 41, 42, 43, 51, 52, 71, 72, 73, 74, 81, 82, 90, 95]
).astype(str)
name = np.array(
    [
        np.nan,
        "Open Water",
        "Perennial Ice/Snow",
        "Developed, Open Space",
        "Developed, Low Intensity",
        "Developed, Medium Intensity",
        "Developed High Intensity",
        "Barren Land (Rock/Sand/Clay)",
        "Deciduous Forest",
        "Evergreen Forest",
        "Mixed Forest",
        "Dwarf Scrub",
        "Shrub/Scrub",
        "Grassland/Herbaceous",
        "Sedge/Herbaceous",
        "Lichens",
        "Moss",
        "Pasture/Hay",
        "Cultivated Crops",
        "Woody Wetlands",
        "Emergent Herbaceous Wetlands",
    ]
)
lu_mapper = dict(zip(codes, name))

Now aggregate the pixels into the (coarser) hexgrid and include a column of colors to match the conventions in land use mapping. First we define a small helper function, then apply it to the column of pixel values, resulting in a hex color for each land-use type (hexcolors are much easier to work with in this context because a hexcode is a single string whereas a RGBA representation is a tuple of values, which can be difficult to represent in a table).

Code
def img_to_hex(hexes, img_path, year):

    hexes = hexes.copy()
    # take the LULC image and convert from raster to vector
    lulc = extract_raster_features(hexes, img_path)
    # convert to a string for easier representation as categorical variable
    lulc["value"] = lulc["value"].astype(int).astype(str)
    # pixel value 255 is NA
    lulc["value"] = lulc["value"].replace({"255": np.nan})
    temp = gpd.sjoin(lulc.to_crs(hexes.crs), hexes)
    temp = (
        temp.groupby("hex_id")["value"].agg(lambda x: pd.Series.mode(x)[0]).astype(str)
    )  # cheat and take the min when multiple modes are returned...
    hexes = hexes.join(temp)
    hexes["year"] = year
    hexes["value"] = hexes["value"].replace(lu_mapper).replace({"nan": np.nan})
    hexes = hexes.dropna()

    return hexes


nlcd_colors = pd.read_csv(
    "nlcd_colors.csv",
)
# convert rgb to 0-1 range
nlcd_colors.color = nlcd_colors.color.apply(
    lambda x: np.array(x.split(",")).astype(float) / 255
)
# convert rgb to hex (easier to hold a single value in a column instead of a list
nlcd_colors.color = nlcd_colors.color.apply(lambda x: to_hex(x))
nlcd_colors
code color description
0 11 #466b9f Open Water
1 12 #d1def8 Perennial Snow/Ice
2 21 #dec5c5 Developed, Open Space
3 22 #d99282 Developed, Low Intensity
4 23 #eb0000 Developed, Medium Intensity
5 24 #ab0000 Developed High Intensity
6 31 #b3ac9f Barren Land (Rock/Sand/Clay)
7 41 #68ab5f Deciduous Forest
8 42 #1c5f2c Evergreen Forest
9 43 #b5c58f Mixed Forest
10 52 #ccb879 Shrub/Scrub
11 71 #dfdfc2 Grassland/Herbaceous
12 81 #dcd939 Pasture/Hay
13 82 #ab6c28 Cultivated Crops
14 90 #b8d9eb Woody Wetlands
15 95 #6c9fb8 Emergent Herbaceous Wetlands

Finally we have a nice table that includes the pixel values, a descriptive name of each land-use type, and the color commonly-used to portray that land-use in maps represented as a hexcode.

Code
color_mapper = dict(
    zip(nlcd_colors.description.values.tolist(), nlcd_colors.color.values.tolist())
)
lulc_hex_01 = img_to_hex(
    hexes,
    "https://spatial-ucr.s3.amazonaws.com/nlcd/landcover/nlcd_landcover_2001.tif",
    2001,
)
lulc_hex_01["color"] = lulc_hex_01.value.replace(color_mapper)
ax = lulc_hex_01.plot(color=lulc_hex_01["color"].values, figsize=(6, 6), alpha=0.6)
ax.axis("off")
ctx.add_basemap(ax=ax, source=ctx.providers.CartoDB.Positron, crs=lulc_hex_01.crs)
plt.show()

Land-Use Categories in Greater Detroit Shown in a Regular Hexgrid

Land-Use Categories in Greater Detroit Shown in a Regular Hexgrid

With this data nicely formatted, we can read in multiple years of land-cover data, then convert them all to hexgrid polygons and assign the appropriate color and descriptive columns. For convenience we will also ignore scrubland and open water.

Code
nlcd_years = [2001, 2004, 2006, 2008, 2011, 2013, 2016]

lulc_hexes = pd.concat(img_to_hex(
    hexes, f"/Users/knaaptime/Dropbox/projects/geosnap_data/nlcd/landcover/nlcd_landcover_{year}.tif",
    year) for year in nlcd_years)
lulc_hexes = lulc_hexes.reset_index()
lulc_hexes = lulc_hexes[lulc_hexes.value!='Shrub/Scrub']
lulc_hexes = lulc_hexes[lulc_hexes.value!='Open Water']
lulc_hexes['color'] = lulc_hexes.value.replace(color_mapper)

The result is a long-form geodataframe that stores land-use over time, which we can use to create a timeseries plot or an animated visualization.

Code
animate_timeseries(
    lulc_hexes,
    filename="lulc_dynamics.gif",
    figsize=(12, 12),
    fps=2,
    dpi=100,
    color_col="color",
)

These are the empirical transitions encoded in the NLCD data between 2001 and 2016, represented at by a slightly larger hexgrid.

Observed Transitions

Observed Transitions

26.2 Modeling Land-Use Change

Using geosnap and PySAL, we can look at these transitions more formally, modeling the change over time as a spatial Markov process. Using this framework, we can ask whether certain transitions are more likely than others given their spatial context. For example, is a piece of undeveloped land more likely to become developed if it’s neighbors have already been developed?

Code
sm = transition(lulc_hexes, unit_index='hex_id', cluster_col='value')
Code
type(sm)
giddy.markov.Spatial_Markov

Inference from the spatial Markov object shows the transitions are not random, but show different rates depending on the “neighborhood” of each observation

Code
sm.Q_p_value
np.float64(0.0)
Code
sm.LR_p_value
np.float64(0.0)

Plotting all the transition matrices can be daunting, but it provides a detailed look at how transition rates vary by context

Code
plot_transition_matrix(transition_model=sm, figsize=(30,30), dpi=200)
array([<Axes: title={'center': 'Global'}>,
       <Axes: title={'center': 'Modal Neighbor - Barren Land (Rock/Sand/Clay)'}>,
       <Axes: title={'center': 'Modal Neighbor - Cultivated Crops'}>,
       <Axes: title={'center': 'Modal Neighbor - Deciduous Forest'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed High Intensity'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed, Low Intensity'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed, Medium Intensity'}>,
       <Axes: title={'center': 'Modal Neighbor - Developed, Open Space'}>,
       <Axes: title={'center': 'Modal Neighbor - Emergent Herbaceous Wetlands'}>,
       <Axes: title={'center': 'Modal Neighbor - Evergreen Forest'}>,
       <Axes: title={'center': 'Modal Neighbor - Grassland/Herbaceous'}>,
       <Axes: title={'center': 'Modal Neighbor - Mixed Forest'}>,
       <Axes: title={'center': 'Modal Neighbor - Pasture/Hay'}>,
       <Axes: title={'center': 'Modal Neighbor - Woody Wetlands'}>,
       <Axes: >, <Axes: >], dtype=object)

We can see that barren land, for example, is much more likely to become developed when it’s neighbors are low-intensity developed (from top-left, see the second row, second column)

26.3 Simulating Future Land-Use

With these transition probabilities in hand (and the set of land uses in the last time period), we have all we need to simulate potential land uses in the future. First, we assume that the empirical transitions observed in prior time periods will remain static, and that only the prior time period influences the probability of transition between landuse types. Then, we can use the spatial context of each unit to select the proper transition matrix, and draw a new land-use type using the transition probabilities observed in the past.

Repeating this process for each unit in the study area gives a stochastic realization of land-use transitions; repeating this process for a range of years lets us look at a potential evolution of the region into the future. We could then change transition probabilities or manually edit land-uses in certain places (e.g. to simulate processes like urban development, sea-level rise, or a protective land-use policy) and see how those changes propagate through the system into the future.

Code
w = Queen.from_dataframe(lulc_hexes[lulc_hexes.year==2016].dropna().reset_index())
Code
temp_res = future_lulc.value.copy().values
all(temp_res == future_lulc.value.values)

future_lulc["temp"] = temp_res
future_lulc[future_lulc.value != temp_res][
    future_lulc[future_lulc.value != temp_res].year == 2019
]
future_lulc["color"] = future_lulc.value.replace(color_mapper)

future_lulc.drop(columns="geometry").head()
index value year temp color
0 284778 Emergent Herbaceous Wetlands 2016 Emergent Herbaceous Wetlands #6c9fb8
1 284779 Emergent Herbaceous Wetlands 2016 Emergent Herbaceous Wetlands #6c9fb8
2 284780 Emergent Herbaceous Wetlands 2016 Emergent Herbaceous Wetlands #6c9fb8
3 284781 Woody Wetlands 2016 Woody Wetlands #b8d9eb
4 284782 Developed, Low Intensity 2016 Developed, Low Intensity #d99282
Code
animate_timeseries(
    future_lulc,
    filename="predicted_LULC.gif",
    categorical=False,
    title="Simulated Land-Use",
    figsize=(12, 12),
    color_col="color",
    fps=3,
    dpi=150,
)

Now we can plot the simulated land-use moving into the future. This model is pretty naive, but the results seem plausible, with development growing outward from existing cores. We also get lots of infill (but the model isnt smart enough to know some places can’t be developed, etc). It would be fairly easy to include some additional restrictions that some observations are fixed because of policy or topology

Simulated Land Use

Simulated Land Use

We can also quickly compute the share of land in each type and compare two time periods. Since the output is a regular grid, we just need to count the number of cells in each category without needing to actually look at the area

26.3.0.1 2019

Code
future_lulc[future_lulc.year==2019].value.value_counts(normalize=True) 
value
Developed, Low Intensity        0.407871
Developed, Open Space           0.190280
Developed, Medium Intensity     0.155071
Deciduous Forest                0.103900
Woody Wetlands                  0.034282
Pasture/Hay                     0.033566
Mixed Forest                    0.024912
Cultivated Crops                0.020974
Emergent Herbaceous Wetlands    0.009897
Developed High Intensity        0.008507
Grassland/Herbaceous            0.005833
Barren Land (Rock/Sand/Clay)    0.003559
Evergreen Forest                0.001348
Name: proportion, dtype: float64

26.3.0.2 2091

Code
future_lulc[future_lulc.year==2091].value.value_counts(normalize=True) 
value
Developed, Low Intensity        0.404944
Developed, Medium Intensity     0.204810
Developed, Open Space           0.190680
Deciduous Forest                0.080273
Pasture/Hay                     0.027228
Woody Wetlands                  0.026301
Mixed Forest                    0.023269
Developed High Intensity        0.015120
Cultivated Crops                0.012866
Emergent Herbaceous Wetlands    0.007033
Grassland/Herbaceous            0.004085
Barren Land (Rock/Sand/Clay)    0.002106
Evergreen Forest                0.001285
Name: proportion, dtype: float64

If we take the future minus the past, we can look at relative change

Code
future_lulc[future_lulc.year == 2091].value.value_counts(normalize=True) - future_lulc[
    future_lulc.year == 2019
].value.value_counts(normalize=True)
value
Barren Land (Rock/Sand/Clay)   -0.001453
Cultivated Crops               -0.008107
Deciduous Forest               -0.023627
Developed High Intensity        0.006612
Developed, Low Intensity       -0.002927
Developed, Medium Intensity     0.049739
Developed, Open Space           0.000400
Emergent Herbaceous Wetlands   -0.002864
Evergreen Forest               -0.000063
Grassland/Herbaceous           -0.001748
Mixed Forest                   -0.001643
Pasture/Hay                    -0.006338
Woody Wetlands                 -0.007981
Name: proportion, dtype: float64

This would suggest a continued trend toward urbanization, with particularly large relative losses in deciduous forest. Low-intensity development is decreasing, but medium-intensity seems to be growing strongly. Following this analysis, some natural questions that arise include, what would it mean (in social, economic, environmental, etc.) terms to have a 5% increase in medium-intensity development and a 2.5% reduction in deciduous forest? What are the implications for climate change, anticipated housing growth, traffic congestion, change in tax revenue, and so on? And how might these conditions differ under a different scenario, such as one where we have alternative land regulations or urban growth constraints?

These questions nudge us further in the direction of simulation modeling and scenario planning, which are exceedingly useful for medium and large-scale urban planning exercises (Avin, 2007; Avin & Goodspeed, 2020; Chakraborty et al., 2011; Chakraborty & McMillan, 2015; Knaap et al., 2020).