6  Dasymetric Interpolation

Code
import contextily as ctx
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import osmnx as ox
import pandas as pd
from geosnap import DataStore
from geosnap import io as gio
from tobler.area_weighted import area_interpolate
from tobler.dasymetric import extract_raster_features, masked_area_interpolate
from tobler.util import h3fy

%load_ext watermark
%watermark -a 'eli knaap'  -d -u -iv
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: eli knaap

Last updated: 2024-03-21

contextily: 1.5.2
matplotlib: 3.4.3
geosnap   : 0.12.1.dev9+g3a1cb0f6de61.d20240110
numpy     : 1.23.5
pandas    : 1.5.3
geopandas : 0.14.1
osmnx     : 1.7.1

“Dasymetric mapping may be defined as a kind of areal interpolation that uses ancillary (additional and related) data to aid in the areal interpolation process. Dasymetric mapping differs from choropleth mapping in that the boundaries of cartographic representation are not arbitrary but reflect the spatial distribution of the variable being mapped (Eicher and Brewer 2001).”

Mennis (2003)

To help improve accuracy in interpolation we can use auxiliary information to mask out areas we know should not be used to distribute the source variable. See Kim & Yao (2010) for a nice overview of the logic behind dasymetric mapping, and Comber & Zeng (2019) for a modern review of commonly-used techniques Jia & Gaughan (2016), Langford (2006), Langford (2007), Qiu & Cromley (2013)

One standard approach is to use additional data sourced from satellite imagery or remote sensing , and use it to mask out uninhabited areas.

For example we can use raster data like https://www.mrlc.gov/national-land-cover-database-nlcd-2016 to mask out uninhabited land uses from the source data. To do so, we need to provide a path to the raster and a list of pixel values that are considered developed.

tobler can accept any kind of raster data that can be read by rasterio, so you can provide your own, or download directly from NLCD linked above. Alternatively, you can read a compressed version of the NLCD we host in the spatialucr quilt directly over S3

Raster data from the national land cover database and stored in the geosnap quilt bucket

Code
datasets = DataStore()
Code
dc = gio.get_acs(datasets, state_fips="11", years=[2019], level="tract")
Code
dc = dc.to_crs(dc.estimate_utm_crs())
Code
hexes = h3fy(dc, resolution=8)
hexes = hexes.to_crs(dc.crs)
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/pyproj/crs/crs.py:1296: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
  proj = self._crs.to_proj4(version=version)
Code
f, ax = plt.subplots(1, 2, figsize=(8, 4))

dc.plot("n_total_pop", ax=ax[0])
hexes.plot(ax=ax[1])

6.1 Simple Dasymetric

Raster data from the national land cover database and stored in the geosnap quilt bucket

Code
dc_hexes = masked_area_interpolate(
    source_df=dc,
    target_df=hexes,
    raster="https://spatial-ucr.s3.amazonaws.com/nlcd/landcover/nlcd_landcover_2019.tif",
    pixel_values=[22, 23, 24],  # low, medium, and high intensity development
    extensive_variables=["n_total_pop"],
)
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
Code
dc_hexes.plot("n_total_pop")

6.1.1 Comparing Dasymetric with Area-Weighted

Code
dc_hexes_simple = area_interpolate(
    source_df=dc, target_df=hexes, extensive_variables=["n_total_pop"]
)
Code
diff = dc_hexes.n_total_pop - dc_hexes_simple.n_total_pop

diff.plot(kind="hist")

Code
dc_hexes.assign(diff=diff).plot("diff", scheme="fisher_jenks", k=10, cmap="RdBu_r")

The blue hexes have lower estimated population when we use the raster, and the red hexes have higher values. In downtown DC, the differences are small. But in certain places, like rock creek park in the northwest, and along the anacostia river, the values are lower when using the raster as a dasymetric layer (showing it’s doing the correct thing). Instead of population being allocated to this green space (and water), it is instead being distributed to the developed areas nearby. If we plot the difference map on top of a satellite image, it’s easy to see most of the time the blue hexes are parks and forests where we do not want to allocate population, and the darkest red hexes are dense neighborhoods adjacent to the blue hexes. In these cases, we are using the raster data effectively to send data to the populated places instead of the uninhabited places.

(it’s not quite perfectly centered on white with this colormap, but jenks work nicely here)

Code
dc_hexes.assign(diff=diff).explore(
    "diff", scheme="fisher_jenks", k=10, cmap="RdBu_r", tiles="USGS USImagery"
)
Make this Notebook Trusted to load map: File -> Trust Notebook

An importrant thing to remember about most of the commonly-used dasymetric approaches (e.g. binary dasymetric) is their focus on the lower part of the distribution. That is, these techniques help ensure that population, etc is not distributed to places that are probably uninhabited, but they still assume a uniform distribution within each of the masked areas. Remotely-sensed land-use data often classifies high intensity land uses like industrial sites and shopping centers in the same category as high-rise apartment buildings, so dasymetric methods may overallocate variables to developed but non-residential spaces. Again it is important to consider the variable under study, the quality of the source and target datasets, and the ancillary data used as a dasymetric mask

6.2 Advanced Dasymetric

NHGIS uses a two step process, in which they first apply a binary dasymetric filter, then apply target density weighting. Here we will follow a similar strategy, relplicating their BD filtering (albeit with slightly different open data and replicable, open source tooling) before carrying out a simpler areal interpolation.

In NHGIS’s BD model for 2000 block data, the inhabited zone consists of all areas that are at least 5% developed impervious surface (within each 30-meter square cell of NLCD 2001 data) and lie within 300 feet of a residential road center line2 but not in a water body, using road definitions and water polygons from the 2010 TIGER/Line Shapefiles.

so we’ll start by grabbing residential streets from OSM, buffering them out 300 feet, then using that geometry to select the impervious surfaces within

Code
rside_streets = ox.features_from_place(
    "Riverside County, California", tags={"highway": "residential"}
)
rside_streets = rside_streets.to_crs(2230)
rside_streets = gpd.GeoDataFrame(geometry=rside_streets.buffer(300), crs=2230)

Now we use the street buffers to extract any cells between 5 and 100% impervious surface

Code
nlcd_path = (
    "https://spatial-ucr.s3.amazonaws.com/nlcd/impervious/nlcd_impervious_2021.tif"
)
urban_rside = extract_raster_features(
    rside_streets, nlcd_path, pixel_values=list(range(5, 101)), collapse_values=True
)
urban_rside = urban_rside.to_crs(2230)
Code
fig, ax = plt.subplots(figsize=(20, 14))
urban_rside.plot(ax=ax)

This geodataframe represents the areas that meet both criteria NHGIS suggests

Now we need some census data to transfer geometries. Lets grab blockgroup-level population counts from the ACS with geosnap

Code
rside_census = gio.get_acs(
    datasets, county_fips="06065", level="bg", years=[2021], constant_dollars=False
)
rside_census = rside_census.to_crs(2230)

rside_census.plot()

And lets generate some synthetic zones to transfer this population data into using h3 hexagons

Code
rside_hex = h3fy(rside_census, resolution=8)

rside_hex.plot()
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/pyproj/crs/crs.py:1296: UserWarning: You will likely lose important projection information when converting to a PROJ string from another format. See: https://proj.org/faq.html#what-is-the-best-format-for-describing-coordinate-reference-systems
  proj = self._crs.to_proj4(version=version)

Ok, now all that’s left is to cut our input census data, “restricting” it to the locations NLCD defines as 5-100% impervious. To do so, we just need to clip the census data using our extracted urban features as a mask

Code
dasy = gpd.overlay(rside_census, urban_rside)
dasy = dasy.dissolve("geoid")
Code
dasy.plot()

Using the intersection overlay, “clips” the census geometries by the ‘urban features’ leaving the original attributes intact but reshaping the geometries. If a census tract had 2000 people in it and covered 1 sqmi, but only a quarter mile of the tract was urbanized, the new geodataframe effectively shows these 2000 people occupying the quarter mile. By passing these new data to the areal_interpolate function, we’re still assuming that population density is constant across our new geometries (a condition that may not be true in reality) but that assumption is much more plausible than when using original census geometries.

Code
interp = area_interpolate(dasy, rside_hex, extensive_variables=["n_total_pop"])
Code
interp.n_total_pop.sum()
2409330.9980494147
Code
rside_census.n_total_pop.sum()
2409331.0
Code
interp["plot"] = interp.n_total_pop.apply(np.log1p)
Code
fig, ax = plt.subplots(figsize=(20, 20))
interp.to_crs(3857).plot("n_total_pop", scheme="fisher_jenks", k=10, ax=ax, alpha=0.6)
ctx.add_basemap(source=ctx.providers.USGS.USImagery, ax=ax)
ax.axis("off")
(-13117985.779566512,
 -12720300.007852016,
 3946892.157340162,
 4044586.9064775496)

And there we have it. Riverside’s 2017 population estimated at a regular hexgrid, using the imperviousness and street proximity information (from a remotely-sensed raster image) to make the interpolation more accurate.

Code
interp.explore(
    "n_total_pop",
    tiles="USGS USImagery",
    tooltip="n_total_pop",
    scheme="fisher_jenks",
    k=10,
    style_kwds={"fill_opacity": 0.3, "opacity": 0.3},
)
Make this Notebook Trusted to load map: File -> Trust Notebook

:::