[[24]{.chapter-number}  [Regionalization ]{.chapter-title}]{#sec-regionalization .quarto-section-identifier}

24 Regionalization

Code

import contextily as ctx
import geopandas as gpd
import matplotlib.pyplot as plt
from geosnap import DataStore
from geosnap import analyze as gaz
from geosnap import io as gio
from geosnap import visualize as gvz
from IPython.display import Image

%load_ext watermark
%watermark -a  'eli knaap' -iv -d -u

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.

Author: eli knaap

Last updated: 2025-04-08

matplotlib: 3.10.0
IPython   : 8.31.0
geopandas : 1.0.1
contextily: 1.6.2
geosnap   : 0.14.1.dev14+g0443e2a.d20250103

As with geodemographic clustering, when carrying out a regionalization exercise, we are searching for groups of observations (census tracts in this case) which are similar in socioeconomic and demographic composition. If the goal of geodemographics is to identify neighborhood types, that could exist anywhere in the region, the goal of regionalization is to identify specific neighborhoods that exist at a distinct place in the region

Following that concept, we can use constrained clustering to develop an empirical version of geographically-bounded neighborhoods, where the neighborhoods are defined by internal social homogeneity. This is similar to the historic and well-defined neighborhood zones in places like Chicago and Pittsburgh. If your goal were to try and delineate neighborhoods in a bespoke region as the LA Times tried to do, regionalization would be a data-driven way to do so.

Code

Image(
    "https://upload.wikimedia.org/wikipedia/commons/b/b3/Chicago_neighborhoods_map.png",
    width=700,
)

https://en.wikivoyage.org/wiki/File:Chicago_neighborhoods_map.png

Code

Image(
    "https://visit-pittsburgh-2023.s3.amazonaws.com/images/archive/2020-Neighborhood-Map2.jpg",
    width=700,
)

https://www.visitpittsburgh.com/neighborhoods/

Code

datasets = DataStore()
la = gio.get_acs(datasets, county_fips="06037", years=2021, level="tract")
la = la.to_crs(la.estimate_utm_crs())

la[["median_household_income", "geometry"]].explore(
    "median_household_income",
    scheme="quantiles",
    k=8,
    cmap="magma",
    tiles="CartoDB Positron",
)

columns = [
    "median_household_income",
    "median_home_value",
    "p_edu_college_greater",
    "p_edu_hs_less",
    "p_nonhisp_white_persons",
    "p_nonhisp_black_persons",
    "p_hispanic_persons",
    "p_asian_persons",
]
la.dropna(subset=columns).plot()

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:273: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
  warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:217: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
  warn(

24.1 Cross-Sectional Regionalization

Code

la_ward_reg, la_ward_model = gaz.regionalize(
    gdf=la,
    method="ward_spatial",
    n_clusters=25,
    columns=columns,
    return_model=True,
    spatial_weights="queen",
)

la_ward_reg[columns + ["geometry", "ward_spatial"]].explore(
    "ward_spatial", categorical=True, cmap="Accent", tiles="CartoDB Positron"
)

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:406: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w0 = W.from_dataframe(df, **weights_kwargs)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/contiguity.py:347: UserWarning: The weights matrix is not fully connected: 
 There are 4 disconnected components.
 There are 2 islands with ids: 568, 1944.
  W.__init__(self, neighbors, ids=ids, **kw)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['4' '4' '4' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)

Make this Notebook Trusted to load map: File -> Trust Notebook

Unlike the general clustering case, regionalization algorithms can only group together observations that are connected via a spatial connectivity graph. That means the map is much more distinctive, but the resulting clusters are usually far less homogenous internally than the unconstrained case.

To understand what the clusters are identifying, another data visualisation technique is useful. Here, we rely on the Tufte-ian principle of “small multiples”, and create a set of violin plots that show how each variable is distributed across each of the clusters. Each of the inset boxes shows a different variable, with cluster assignments on the x-axis, and the variable itself on the y-axis The boxplot in the center shows the median and IQR, and the “body” of the “violin” is a kernel-density estimate reflected across the x-axis. In short, the fat parts of the violin show where the bulk of the observations are located, and the skinny “necks” show the long tails.

Code

gvz.plot_violins_by_cluster(la_ward_reg, columns, cluster_col="ward_spatial")

array([<Axes: title={'center': 'median_household_income'}, xlabel='ward_spatial', ylabel='median_household_income'>,
       <Axes: title={'center': 'median_home_value'}, xlabel='ward_spatial', ylabel='median_home_value'>,
       <Axes: title={'center': 'p_edu_college_greater'}, xlabel='ward_spatial', ylabel='p_edu_college_greater'>,
       <Axes: title={'center': 'p_edu_hs_less'}, xlabel='ward_spatial', ylabel='p_edu_hs_less'>,
       <Axes: title={'center': 'p_nonhisp_white_persons'}, xlabel='ward_spatial', ylabel='p_nonhisp_white_persons'>,
       <Axes: title={'center': 'p_nonhisp_black_persons'}, xlabel='ward_spatial', ylabel='p_nonhisp_black_persons'>,
       <Axes: title={'center': 'p_hispanic_persons'}, xlabel='ward_spatial', ylabel='p_hispanic_persons'>,
       <Axes: title={'center': 'p_asian_persons'}, xlabel='ward_spatial', ylabel='p_asian_persons'>,
       <Axes: >], dtype=object)

Notice here that the violin plots are much harder to distinguish. That is, by enforcing a contiguity constraint, many census tracts are grouped with other tracts that they are closest to, rather than those they are most similar to in multivariate attribute space. This is reflected in the silhouette scores as well.

Unlike the general clustering case, regionalization algorithms return a dictionary of ModelResults classes (one for each unique time period in the dataset) because it is not possible to create a pooled regionalization solution using data from multiple time periods

Code

la_ward_model

{2021: <geosnap.analyze._model_results.ModelResults at 0x338f8b350>}

Code

la_ward_model[2021].plot_silhouette()

This is a poor silhouette score indeed–the overall score for the solution is negative, meaning most tracts are more similar (in attributes) to tracts in another cluster than the one they are assigned. But a silhouette score is not the ideal metric in this case, because attribute similarity is no longer the only objective the model is trying to achieve.

Code

la_ward_model[2021].plot_silhouette_map(figsize=(9, 9))

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/visualize/mapping.py:170: UserWarning: `proplot` is not installed.  Falling back to matplotlib
  warn("`proplot` is not installed.  Falling back to matplotlib")

Rather than searching for unique types, regardless of where they are in the study area (which is what general clustering algorithms focus on), regionalization algorithms arguable search for unique places in the study area, where uniqueness is defined by the concentration of attribute similarity in space. That is, we are looking for places that are geographically distinct from their neighbors, where members of a single region are relatively similar to one another, but distinctly different from those nearby.

To operationalize these concepts, we leverage the notion of a geosilhouette, which blends the roles of geographic and attribute similarity into a single metric. There are two types of geosilhouettes, depending on whether we are most interested in differences between region cores (path silhouettes) or region boundaries (boundary silhouettes)

Code

la_ward_model[2021].boundary_silhouette.boundary_silhouette.mean()

0.09576599634910776

Code

la_ward_model[2021].plot_boundary_silhouette(figsize=(8, 8))

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/visualize/mapping.py:170: UserWarning: `proplot` is not installed.  Falling back to matplotlib
  warn("`proplot` is not installed.  Falling back to matplotlib")
/Users/knaaptime/Dropbox/projects/mapclassify/mapclassify/classifiers.py:1686: UserWarning: Not enough unique values in array to form 6 classes. Setting k to 4.
  self.bins = quantile(y, k=k)

Code

la_ward_model[2021].path_silhouette.path_silhouette.mean()

0.04836626548915017

Code

la_ward_model[2021].plot_path_silhouette(figsize=(8, 8))

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/visualize/mapping.py:170: UserWarning: `proplot` is not installed.  Falling back to matplotlib
  warn("`proplot` is not installed.  Falling back to matplotlib")

There are also many regionalization algorithms to choose from (see the PySAL spopt package for more information on each of them)

Code

cluster_types = [
    "kmeans_spatial",
    "ward_spatial",
    "skater",
    "spenc",
    "azp",
]

from sklearn.preprocessing import MinMaxScaler

f, ax = plt.subplots(2, 3, figsize=(11, 10))
ax = ax.flatten()

for i, k in enumerate(cluster_types):
    df = gaz.regionalize(
        gdf=la,
        method=k,
        columns=columns,
        n_clusters=25,
        scaler=MinMaxScaler(),
        weights_kwargs=dict(use_index=True, silence_warnings=True),
        region_kwargs=dict(gamma=1),
    )
    df.to_crs(3857).sort_values(k).plot(k, categorical=True, figsize=(7, 7), ax=ax[i])
    ax[i].axis("off")
    ax[i].set_title(k, fontsize=14)
    ctx.add_basemap(ax=ax[i], source=ctx.providers.CartoDB.Positron)
    plt.tight_layout()

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['12' '12' '12' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['5' '5' '5' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['24' '24' '24' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['15.0' '15.0' '16.0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)

Again, the colors are arbitrary and the patterns (i.e. which tracts get grouped together in a single color) are the important parts

Code

ks_best, ks_table = gaz.find_region_k(
    gdf=la,
    method="ward_spatial",
    columns=columns,
    min_k=2,
    max_k=50,
    return_table=True,
    weights_kwargs=dict(use_index=True, silence_warnings=True),
)

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['4' '4' '4' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['7' '7' '7' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['7' '7' '7' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['3' '3' '3' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['40' '40' '40' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['40' '40' '40' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['40' '40' '40' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['19' '19' '19' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['19' '19' '19' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['19' '19' '19' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['19' '19' '19' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['19' '19' '19' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)

Code

ks_best

	silhouette_score	calinski_harabasz_score	path_silhouette	boundary_silhouette	davies_bouldin_score
time_period
2021.0	2.0	2.0	2.0	50.0	2.0

Code

ks_table

	silhouette_score	calinski_harabasz_score	davies_bouldin_score	path_silhouette	boundary_silhouette	time_period
k
2.0	0.398151	984.873146	1.142351	0.285660	0.156863	2021.0
3.0	-0.099869	540.893605	1.262321	-0.096397	0.134452	2021.0
4.0	-0.100067	561.134923	2.388560	-0.067966	0.156785	2021.0
5.0	-0.164610	420.742953	15.480583	-0.004911	0.147162	2021.0
6.0	-0.155095	475.288513	12.884447	-0.014573	0.170921	2021.0
7.0	-0.208675	396.009640	15.430332	-0.011200	0.160621	2021.0
8.0	-0.213824	364.579200	5.441137	0.011880	0.164475	2021.0
9.0	-0.232700	322.961558	5.853003	-0.066787	0.161105	2021.0
10.0	-0.232987	287.470732	6.417645	-0.086747	0.154685	2021.0
11.0	-0.213964	331.574073	7.655067	-0.031066	0.153457	2021.0
12.0	-0.244753	308.848240	7.075693	-0.070394	0.157718	2021.0
13.0	-0.290783	283.864933	8.269101	-0.086818	0.152558	2021.0
14.0	-0.316090	262.612153	7.861418	-0.097933	0.151801	2021.0
15.0	-0.308455	244.006804	9.791654	-0.074926	0.146494	2021.0
16.0	-0.307157	229.317140	8.403116	-0.133800	0.144641	2021.0
17.0	-0.300797	219.024637	8.515017	-0.075607	0.148148	2021.0
18.0	-0.298789	209.944401	24.807859	-0.076715	0.156159	2021.0
19.0	-0.289588	220.348970	23.967728	-0.068545	0.171909	2021.0
20.0	-0.274908	209.186327	23.564407	-0.032614	0.163129	2021.0
21.0	-0.279840	199.691378	23.424854	-0.031545	0.162937	2021.0
22.0	-0.255853	201.545893	22.195736	-0.014673	0.167697	2021.0
23.0	-0.275016	194.423510	22.830091	-0.018878	0.176015	2021.0
24.0	-0.309667	185.967568	12.279219	-0.024675	0.177393	2021.0
25.0	-0.315635	178.964613	12.795831	-0.017868	0.181718	2021.0
26.0	-0.314516	171.861709	12.044948	-0.007664	0.177711	2021.0
27.0	-0.314553	165.242128	14.197395	-0.004275	0.177108	2021.0
28.0	-0.313243	159.256630	14.280620	0.015305	0.174091	2021.0
29.0	-0.313989	154.192317	13.858454	0.003477	0.172974	2021.0
30.0	-0.311863	149.725080	13.335031	0.003079	0.172515	2021.0
31.0	-0.325660	154.414694	16.534351	0.010956	0.176604	2021.0
32.0	-0.335385	158.985904	12.875840	0.017671	0.182148	2021.0
33.0	-0.361766	154.640257	13.314907	0.027909	0.184621	2021.0
34.0	-0.357259	166.883390	12.838592	0.022596	0.192891	2021.0
35.0	-0.355503	162.347198	11.599117	0.034238	0.192490	2021.0
36.0	-0.355523	157.719232	12.066668	0.026351	0.192574	2021.0
37.0	-0.364341	153.647442	10.578417	0.006374	0.187733	2021.0
38.0	-0.361290	162.682233	10.919495	0.007034	0.192374	2021.0
39.0	-0.357967	170.534969	10.518244	0.026122	0.197713	2021.0
40.0	-0.353455	168.902762	9.913520	0.028055	0.198374	2021.0
41.0	-0.353114	164.664134	9.978973	0.044299	0.196727	2021.0
42.0	-0.350939	162.513423	11.640883	0.051075	0.199428	2021.0
43.0	-0.362779	159.203538	12.737700	0.052471	0.199332	2021.0
44.0	-0.377136	161.149831	12.429899	0.053939	0.203337	2021.0
45.0	-0.376220	157.745153	16.061967	0.047712	0.201914	2021.0
46.0	-0.414931	154.314938	16.739235	0.043448	0.203588	2021.0
47.0	-0.414015	152.024407	16.571391	0.045772	0.203538	2021.0
48.0	-0.425031	154.389852	17.464038	0.045859	0.204954	2021.0
49.0	-0.440513	155.331230	17.239211	0.045496	0.206206	2021.0
50.0	-0.438646	154.086244	17.601186	0.058159	0.206675	2021.0

Code

ks_table.columns

Index(['silhouette_score', 'calinski_harabasz_score', 'davies_bouldin_score',
       'path_silhouette', 'boundary_silhouette', 'time_period'],
      dtype='object')

Code

f, ax = plt.subplots(2, 3, figsize=(15, 10))
ax = ax.flatten()

for i, metric in enumerate(ks_table.columns.values):
    ks_table[metric].plot(ax=ax[i])
    ax[i].set_title(metric)

Whereas the general silhouette score (and the other general cluster metrics) decrease steadily is the number of clusters increases (because we start to split hairs into super specialized types), the geosilhouettes, especially the boundary silhouette generally increase as n_clusters increases because it becomes easier and easier to differentiate units from nearby units as you consider smaller and smaller neighborhoods.

In addition to the set of compositional attributes, a crucial input to regionalization algorithms is the spatial weights parameter (W), which defines the “geographic similarity” between units. Generally, the goal of regionalization algorithms is to define a distinct, contiguous [set of] region(s), which usually means using a contiguity-based W object (and as such, the default W is a Rook weight). But it is also possible to relax the contiguity enforcement using something like a distance-band weight, which would group nearby observations even if they are not necessarily touching each other.

Code

from libpysal.weights import DistanceBand

la_distance_band = gaz.regionalize(
    gdf=la,
    columns=columns,
    method="ward_spatial",
    n_clusters=25,
    spatial_weights=DistanceBand,
    weights_kwargs=dict(threshold=2000),
)

la_distance_band[["ward_spatial", "geometry"]].explore(
    "ward_spatial", categorical=True, tiles="CartoDB Positron"
)

/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/util.py:826: UserWarning: The weights matrix is not fully connected: 
 There are 107 disconnected components.
 There are 91 islands with ids: 10, 14, 15, 52, 828, 831, 833, 1012, 1078, 1085, 1086, 1087, 1089, 1146, 1255, 1277, 1283, 1467, 1468, 1471, 1489, 1706, 1944, 2112, 2115, 2117, 2166, 2167, 2168, 2169, 2171, 2174, 2175, 2178, 2179, 2180, 2182, 2183, 2184, 2185, 2186, 2187, 2188, 2189, 2190, 2191, 2193, 2214, 2215, 2221, 2226, 2227, 2228, 2230, 2231, 2232, 2233, 2234, 2236, 2237, 2238, 2242, 2243, 2246, 2260, 2270, 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2284, 2296, 2302, 2303, 2304, 2305, 2306, 2314, 2316, 2317, 2318, 2319, 2321, 2323, 2330, 2331, 2336, 2337.
  w = W(neighbors, weights, ids, **kwargs)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/distance.py:844: UserWarning: The weights matrix is not fully connected: 
 There are 107 disconnected components.
 There are 91 islands with ids: 10, 14, 15, 52, 828, 831, 833, 1012, 1078, 1085, 1086, 1087, 1089, 1146, 1255, 1277, 1283, 1467, 1468, 1471, 1489, 1706, 1944, 2112, 2115, 2117, 2166, 2167, 2168, 2169, 2171, 2174, 2175, 2178, 2179, 2180, 2182, 2183, 2184, 2185, 2186, 2187, 2188, 2189, 2190, 2191, 2193, 2214, 2215, 2221, 2226, 2227, 2228, 2230, 2231, 2232, 2233, 2234, 2236, 2237, 2238, 2242, 2243, 2246, 2260, 2270, 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2284, 2296, 2302, 2303, 2304, 2305, 2306, 2314, 2316, 2317, 2318, 2319, 2321, 2323, 2330, 2331, 2336, 2337.
  W.__init__(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['18' '18' '18' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)

Make this Notebook Trusted to load map: File -> Trust Notebook

Code

f, ax = plt.subplots(2, 3, figsize=(11, 10))
ax = ax.flatten()
la = la.to_crs(la.estimate_utm_crs())
for i, k in enumerate(cluster_types):
    df = gaz.regionalize(
        gdf=la,
        method=k,
        columns=columns,
        spatial_weights=DistanceBand,
        n_clusters=25,
        scaler=MinMaxScaler(),
        weights_kwargs=dict(use_index=True, threshold=2000, silence_warnings=True),
        region_kwargs=dict(gamma=0.25),
    )
    df.to_crs(3857).sort_values(k).plot(k, categorical=True, figsize=(7, 7), ax=ax[i])
    ax[i].axis("off")
    ax[i].set_title(k, fontsize=14)
    ctx.add_basemap(ax=ax[i], source=ctx.providers.CartoDB.Positron)
    plt.tight_layout()

/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['16' '16' '16' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['1' '1' '1' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0' '0' '0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['10' '10' '10' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/analyze/geodemo.py:429: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['0.0' '0.0' '0.0' ... nan nan nan]' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  gdf.update(clusters)