Code
import geopandas as gpd
import pandas as pd
from geosnap import DataStore
from geosnap import io as gio
= DataStore() datasets
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Spatial data comes in a rich variety of forms and corresponding file formats. At the beginning of most geocomputational workflows, one is typically reading these different formats and applying different forms of spatial data processing (or geoprocessing) methods to the data.
In this notebook we cover a subset of geoprocessing methods:
Along the way we introduce the package geopandas which provides key spatial data processing functionality. The core data structure for spatial analysis in Python is the GeoDataFrame, which provides a tabular representation and stores geographic information as a special kind of attribute.
import geopandas as gpd
import pandas as pd
from geosnap import DataStore
from geosnap import io as gio
= DataStore() datasets
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
= gio.get_ejscreen(datasets, msa_fips="42660", years=2019)
sea_ejscreen = sea_ejscreen.to_crs(4326) # convert to lat/long sea_ejscreen
We start by loading data from the The EPA’s Environmental Justic Screening Tool (EJSCREEN), which contains data on vulnerable populations, and spatial externalities that may have negative health consequences. The EPA processes these together into a composite “EJ” indicator that yields high values when vulnerable populations are exposed to high risks. The dataset also includes information for the national percentile ranking of each indicator for each blockgroup. The most useful data include the pure “risk” indicators listed below:
Variable | Description |
---|---|
DSLPM | Diesel particulate matter level in air |
CANCER | Air toxics cancer risk |
RESP | Air toxics respiratory hazard index |
PTRAF | Traffic proximity and volume |
PWDIS | Indicator for major direct dischargers to water |
PNPL | Proximity to National Priorities List (NPL) sites |
PRMP | Proximity to Risk Management Plan (RMP) facilities |
PTSDF | Proximity to Treatment Storage and Disposal (TSDF) facilities |
OZONE | Ozone level in air |
PM25 | PM2.5 level in air |
sea_ejscreen.head()
geoid | ACSTOTPOP | ACSIPOVBAS | ACSEDUCBAS | ACSTOTHH | ACSTOTHU | MINORPOP | MINORPCT | LOWINCOME | LOWINCPCT | ... | T_PM25_P2 | T_PM25_P6 | AREALAND | AREAWATER | NPL_CNT | TSDF_CNT | Shape_Length | Shape_Area | geometry | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 530330001001 | 1265 | 1235 | 1095 | 608 | 608 | 155 | 0.122530 | 136 | 0.110121 | ... | 7%ile | 12%ile | 892826.0 | 904304.0 | 0 | 0 | 8633.745188 | 3.969193e+06 | MULTIPOLYGON (((-122.28977 47.73374, -122.2884... | 2019 |
1 | 530330001002 | 1534 | 1527 | 1210 | 758 | 871 | 765 | 0.498696 | 558 | 0.365422 | ... | 54%ile | 54%ile | 288190.0 | 0.0 | 0 | 0 | 3940.705233 | 6.365748e+05 | MULTIPOLYGON (((-122.29654 47.73015, -122.2952... | 2019 |
2 | 530330001003 | 1817 | 1817 | 1263 | 724 | 738 | 731 | 0.402312 | 543 | 0.298844 | ... | 44%ile | 28%ile | 370737.0 | 0.0 | 0 | 0 | 3878.874929 | 8.187073e+05 | MULTIPOLYGON (((-122.29321 47.72291, -122.2929... | 2019 |
3 | 530330001004 | 2270 | 2270 | 1332 | 1052 | 1134 | 1622 | 0.714537 | 1283 | 0.565198 | ... | 71%ile | 67%ile | 126662.0 | 0.0 | 0 | 0 | 2128.066919 | 2.798096e+05 | MULTIPOLYGON (((-122.29656 47.73198, -122.2965... | 2019 |
4 | 530330001005 | 1077 | 1077 | 808 | 637 | 679 | 393 | 0.364903 | 314 | 0.291551 | ... | 41%ile | 41%ile | 230515.0 | 0.0 | 0 | 0 | 3376.927631 | 5.090533e+05 | MULTIPOLYGON (((-122.29642 47.72651, -122.2946... | 2019 |
5 rows × 368 columns
sea_ejscreen.shape
(2483, 368)
This dataset has 2,483 observations (rows) and 368 attributes (columns)
sea_ejscreen.columns
Index(['geoid', 'ACSTOTPOP', 'ACSIPOVBAS', 'ACSEDUCBAS', 'ACSTOTHH',
'ACSTOTHU', 'MINORPOP', 'MINORPCT', 'LOWINCOME', 'LOWINCPCT',
...
'T_PM25_P2', 'T_PM25_P6', 'AREALAND', 'AREAWATER', 'NPL_CNT',
'TSDF_CNT', 'Shape_Length', 'Shape_Area', 'geometry', 'year'],
dtype='object', length=368)
The sea_ejscreen
GeoDataFrame is a special kind of pandas DataFrame that stores information about the geometric information associated with each record in the dataset. As such, any pandas
operation will work as normal; to demonstrate, lets rename the ‘ACSTOTPOP’ column to ‘total_population’. Note that we need to re-save the dataframe back into a variable to store the change.
1] sea_ejscreen.columns[
'ACSTOTPOP'
= sea_ejscreen.rename(columns={"ACSTOTPOP": "total_population"}) sea_ejscreen
1] sea_ejscreen.columns[
'total_population'
See the second column (remember Python indexing begins at zero!) now has the updated name
Two important pieces of information distinguish a GeoDataFrame
from a simple aspatial DataFrame
: a ‘geometry’ column that defines the shape and of each feature, and a Coordinate Reference System (CRS) that stores metadata about how the shape is encoded.
sea_ejscreen.geometry
0 MULTIPOLYGON (((-122.28977 47.73374, -122.2884...
1 MULTIPOLYGON (((-122.29654 47.73015, -122.2952...
2 MULTIPOLYGON (((-122.29321 47.72291, -122.2929...
3 MULTIPOLYGON (((-122.29656 47.73198, -122.2965...
4 MULTIPOLYGON (((-122.29642 47.72651, -122.2946...
...
2478 MULTIPOLYGON (((-122.31386 48.07283, -122.3130...
2479 MULTIPOLYGON (((-122.32737 48.12333, -122.3256...
2480 MULTIPOLYGON (((-122.3687 48.12337, -122.36322...
2481 MULTIPOLYGON (((-122.44294 47.79322, -122.4429...
2482 MULTIPOLYGON (((-122.45861 48.29771, -122.4520...
Name: geometry, Length: 2483, dtype: geometry
Since the Seattle blockgroups are polygons, naturally they are represented as a shapely Polygon
(or MultiPolygon
, meaning there are multiple shapes that combine to create a single blockgroup) object. This means, essentially, that each polygon is represented as a set of coordinates that define the polygon border. The units of those coordinates are stored in the GeoDataFrame’s Coordinate Reference System attribute crs
sea_ejscreen.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In this case, the blockgroups stored in Latitude/Longitude using the well-known WGS84 datum. Latitude/Longitude is called a ‘geographic coordinate system’ because the coordinates refer to locations on a spheroid, and the data have not been projected onto a flat plane.
sea_ejscreen.crs.is_projected
False
sea_ejscreen.crs.is_geographic
True
Naturally, the CRS defined on the GeoDataFrame governs the behavior of any spatial operation performed against the dataset, like computing the area of each polygon, the area of intersection with another dataset, or the distance between observations
sea_ejscreen.area
/var/folders/79/cknfb1sx2pv16rztkpg6wzlw0000gn/T/ipykernel_81982/554462416.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
sea_ejscreen.area
0 0.000215
1 0.000035
2 0.000044
3 0.000015
4 0.000028
...
2478 0.000457
2479 0.002580
2480 0.002256
2481 0.021275
2482 0.001603
Length: 2483, dtype: float64
Note that we get a warning from GeoPandas when trying to compute the are of a polygon stored in a geographic CRS. But we can estimate an appropriate Universal Transverse Mercator (UTM) Zone for the center of the Seattle Metro dataset, then reproject the blockgroups into that system, and recompute the area.
# get the UTM zone for Chicago
= sea_ejscreen.estimate_utm_crs()
utm_crs utm_crs
<Projected CRS: EPSG:32610>
Name: WGS 84 / UTM zone 10N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 126°W and 120°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - British Columbia (BC); Northwest Territories (NWT); Nunavut; Yukon. United States (USA) - Alaska (AK).
- bounds: (-126.0, 0.0, -120.0, 84.0)
Coordinate Operation:
- name: UTM zone 10N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
Seattle falls inside UTM Zone 10 North (apparently), and since UTM is measured in meters, the second area
calculation shows the (correct) area of each blockgroup as square kilometers.
# reproject into UTM and convert area in square meters to square kilometers
/ 1e6 sea_ejscreen.to_crs(utm_crs).area
0 1.795820
1 0.287980
2 0.370467
3 0.126570
4 0.230347
...
2478 3.781797
2479 21.355873
2480 18.671198
2481 176.442839
2482 13.218388
Length: 2483, dtype: float64
One other difference between a DataFrame and a GeoDataFrame is the plot
method has been overloaded to generate a quick map
sea_ejscreen.plot()
Inevitably, many analysts will encounter geospatial data stored in a variety of formats, such as GeoJSON, Shapefile, or GeoPackage. I recommend you avoid these formats whenever possible, and instead adopt the geoparquet file standard, recently adopted by the Open Geospatial Consortium (OGC),
# shapefile
"sea_ejscreen.shp")
sea_ejscreen.to_file(
# geopackage
"sea_ejscreen.gpkg")
sea_ejscreen.to_file(
# geoJSON
"sea_ejscreen.geojson") sea_ejscreen.to_file(
/var/folders/79/cknfb1sx2pv16rztkpg6wzlw0000gn/T/ipykernel_81982/2797242796.py:2: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile.
sea_ejscreen.to_file("sea_ejscreen.shp")
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Normalized/laundered field name: 'total_population' to 'total_popu'
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Creating a 256th field, but some DBF readers might only support 255 fields
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Normalized/laundered field name: 'Shape_Length' to 'Shape_Leng'
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 216480874.219607532 of field Shape_Area of feature 1196 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 981047400 of field AREALAND of feature 1198 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 2145686362.80968022 of field Shape_Area of feature 1198 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 216586865.899602592 of field Shape_Area of feature 1257 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 141084475 of field AREALAND of feature 1396 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 315407230.292297125 of field Shape_Area of feature 1396 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 120142050.353007436 of field Shape_Area of feature 1397 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 129895484.036698744 of field Shape_Area of feature 1401 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 453692711 of field AREALAND of feature 1410 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 1016917746.02945876 of field Shape_Area of feature 1410 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 126417492 of field AREALAND of feature 1418 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 282073365.473362327 of field Shape_Area of feature 1418 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 205483365 of field AREALAND of feature 1419 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 461735764.169596612 of field Shape_Area of feature 1419 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 1058491513 of field AREALAND of feature 1420 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 2377816809.94271564 of field Shape_Area of feature 1420 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 240227542 of field AREAWATER of feature 1421 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 526265974.589439332 of field Shape_Area of feature 1421 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 853256895 of field AREALAND of feature 1578 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 1845911755.71119523 of field Shape_Area of feature 1578 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 799220178 of field AREALAND of feature 1579 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 1728860672.61893773 of field Shape_Area of feature 1579 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 222905920 of field AREALAND of feature 1580 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 481170117.070646405 of field Shape_Area of feature 1580 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 108771654.294337884 of field Shape_Area of feature 1849 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 122681699.308987796 of field Shape_Area of feature 1851 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 224378740 of field AREALAND of feature 1861 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 488484696.789095163 of field Shape_Area of feature 1861 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 213331394.985593498 of field Shape_Area of feature 1875 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 180881880.969070137 of field Shape_Area of feature 1901 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 114018076.931013033 of field Shape_Area of feature 1918 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 472405208 of field AREALAND of feature 1919 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 1016772624.23833394 of field Shape_Area of feature 1919 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 104808859.213883787 of field Shape_Area of feature 2277 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 132750392.079425231 of field Shape_Area of feature 2428 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 194732404.397688001 of field Shape_Area of feature 2439 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 144734852 of field AREALAND of feature 2441 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 326723162.962453306 of field Shape_Area of feature 2441 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 440222456 of field AREALAND of feature 2453 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 999329771.136474967 of field Shape_Area of feature 2453 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 122465996.601940334 of field Shape_Area of feature 2454 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 1377114382 of field AREALAND of feature 2461 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 3106277691.40890026 of field Shape_Area of feature 2461 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 309731183 of field AREALAND of feature 2462 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 701541038.714854598 of field Shape_Area of feature 2462 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 160682244 of field AREALAND of feature 2463 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 358903485.785643816 of field Shape_Area of feature 2463 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 107408587.184657708 of field Shape_Area of feature 2464 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 927140623 of field AREALAND of feature 2465 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 2070074284.29274464 of field Shape_Area of feature 2465 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 250697175 of field AREALAND of feature 2473 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 568802501.603362322 of field Shape_Area of feature 2473 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 176573510 of field AREAWATER of feature 2481 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.12/site-packages/pyogrio/raw.py:709: RuntimeWarning: Value 393917032.029277146 of field Shape_Area of feature 2481 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
GeoPandas is smart enough to infer the file type from the extension provided in the output file name (but this can be overridden with the driver
argument). Notice how long this cell takes to execute!
And you can read a GeoDataFrame into memory from a file on disk similarly
# re-read the exported data into a new variable
= gpd.read_file("sea_ejscreen.shp") sea_ejscreen_shp
Remember, though, that because of the inherent properties of these file types, sometimes “round-tripping” (i.e. writing a dataset to disk then re-reading it into memory) can lose information! For example writing to a shapefile will truncate any column names longer than 10 characters, and writing to GeoJSON will lose any coordinate system metadata…
1] sea_ejscreen_shp.columns[
'total_popu'
Note that in sea_ejscreen_shp
the ‘total_population’ column has been truncated back to ‘total_popu’
import os
"sea_ejscreen.parquet") sea_ejscreen.to_parquet(
for ext in ["parquet", "gpkg", "geojson"]:
= os.path.getsize(f"sea_ejscreen.{ext}")
size print(f"{ext}: {round(size / 1e6, 2)} MB")
parquet: 10.28 MB
gpkg: 15.46 MB
geojson: 43.53 MB
For this dataset, the geopackage storage is 1.5x the size the parquet file, and the GeoJSON file is more than 4x(!) the file size. The tradeoff is that GeoJSON is a simple text file that can be opened and edited with any editor; the geopackage is a special kind of SQLite database accessible by sqlite editor, and the parquet file is a compressed binary that needs an open-source driver to be decoded
Geopandas can carry out all standard GIS operations using methods implemented on a GeoDataFrame, for example
By combining these operations along with spatial predicates, we can create queries based on the topological relationships between two sets of geographic units, which is often critical for creating variables of interest.
To demonstrate, we will first collect data from OpenStreetMap (OSM), specifically highways in the Seattle metro. In OSM parlance, this means we’re querying for “highways” with the “motorway” tag (which means “the highest-performance roads within a territory. It should be used only on roads with control of access, or selected roads with limited access depending on the local context and prevailing convention. Those roads are generally referred to as motorways, freeways or expressways in English.”)
import osmnx as ox
= ox.features_from_polygon(
highways ={"highway": "motorway"}
sea_ejscreen.union_all(), tags )
This returns a new GeoDataFrame storing each highway as a line feature.
highways.head()
geometry | highway | ref | fixme | payment:good_to_go | source | name | lit | bus:lanes | hgv | ... | motorcycle:lanes:conditional | sidewalk | hov:lanes:backward | hov:lanes:forward | hazmat | start_date | sidewalk:left | sidewalk:right | bridge:movable | wikidata | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element | id | |||||||||||||||||||||
way | 4634293 | LINESTRING (-122.31862 47.64265, -122.3177 47.... | motorway | WA 520 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4644156 | LINESTRING (-122.32264 47.65701, -122.3226 47.... | motorway | I 5 | NaN | NaN | NaN | Ship Canal Bridge | yes | NaN | designated | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
4644167 | LINESTRING (-122.32242 47.6467, -122.3224 47.6... | motorway | I 5 | NaN | NaN | NaN | Ship Canal Bridge | yes | NaN | designated | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
4644173 | LINESTRING (-122.32265 47.6467, -122.32258 47.... | motorway | I 5 Express | NaN | NaN | NaN | Ship Canal Bridge | yes | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
4709507 | LINESTRING (-122.32598 47.73592, -122.32558 47... | motorway | I 5 | NaN | NaN | NaN | NaN | NaN | designated||| | designated | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 104 columns
highways.plot()
Notice in the call to features_from_polygon
above, we used the unary_union
operator on the Seattle tracts dataframe. This effectively combines all the tracts into a single polygon so we are querying anything that intersects any tract, rather than querying intersections with each tract individually. We can do the same thing on the highway GeoDataFrame to see the effect
# note in geopandas <1.0 this is `highways.unary_union`
= highways.union_all() hw_union
hw_union
Now hw_union
is a single shapely.Polygon with no attribute information
=[hw_union], crs=4326).explore(tiles='CartoDB Positron') gpd.GeoDataFrame(geometry
(Why isn’t that tiny section of State Highway 522 connected up in the Northeast? I have no idea.)
Let’s assume the role of a public health epidemiologist who is interested in equity issues surrounding exposure to highways and automobile emissions. We may be interested in who lives near the highway and whether the population nearby experiences a heightened exposure to toxic emissions.
One simple question would be, which tracts have a highway run through them? We can formalize that by asking which tracts intersect the highway system.
= sea_ejscreen[sea_ejscreen.intersects(hw_union)]
highway_blockgroups highway_blockgroups.plot()
A more complicated question is, which tracts are within 1.5km of a road? This is ‘complicated’ because it forces us to formalize an ill-defined relationship: the distance between a polygon and the nearest point on a line. What does it mean for the polygon to be ‘within’ 1.5km? Does that mean the whole tract? most of it? any part of it?
If we can define a most suitable distance measure, the technical selection is easy to execute using an intermediate geometry.
= highways.to_crs(highways.estimate_utm_crs()).buffer(1500) road_buffer
=[road_buffer.union_all()], crs=road_buffer.crs).explore(tiles='CartoDB Positron') gpd.GeoDataFrame(geometry
sea_ejscreen.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
sea_ejscreen[sea_ejscreen.intersects(road_buffer.union_all())]
geoid | total_population | ACSIPOVBAS | ACSEDUCBAS | ACSTOTHH | ACSTOTHU | MINORPOP | MINORPCT | LOWINCOME | LOWINCPCT | ... | T_PM25_P2 | T_PM25_P6 | AREALAND | AREAWATER | NPL_CNT | TSDF_CNT | Shape_Length | Shape_Area | geometry | year |
---|
0 rows × 368 columns
This gives us back nothing… There is no intersection because the EJSCREEN data is still stored in Lat/Long, but we reprojected the road buffer into UTM
= sea_ejscreen.to_crs(road_buffer.crs) sea_ejscreen
By selecting the tracts that intersect with the interstate buffer, we are codifying the tracts as ‘near the highway’ if any portion of a tract is within 1.5km. This can be an awkard choice when polygons are irregularly shaped or heteogeneously sized (Census tracts are both). This means large tracts get included as ‘near’, even when a small portion of the polygon is within the 1.5km threshold (like the tract on the far Eastern edge).
sea_ejscreen[sea_ejscreen.intersects(road_buffer.union_all())].plot()
Alternatively, we might ask, which tracts have their center within 1.5km of a highway? Or more formally, which tracts have their centroids intersect with the 1500m buffer.
sea_ejscreen[sea_ejscreen.centroid.intersects(road_buffer.union_all())].plot()
If we are happy with that definition of proximity, we can use the spatial selection to create and update a new attribute on the dataframe. Here, we will select the tracts whose centroids are within the threshold distance, then create a new column called “highway_buffer”, set to “inside” (using the indices of the spatial selection to define which rows are being set).
# get the dataframe index of the tracts intersecting the buffer
= sea_ejscreen[
inside_idx
sea_ejscreen.centroid.intersects(road_buffer.union_all()) ].index
# set the 'highway_buffer' attribute to 'inside' for the indices within
"highway_buffer"] = "inside"
sea_ejscreen.loc[inside_idx,
# fill all NaN values in the column with 'outside'
"highway_buffer"] = sea_ejscreen["highway_buffer"].fillna("outside").astype('category') sea_ejscreen[
Now ‘highway_buffer’ is a binary variable defining whether a tract is “near” a highway or not. We could have set these values to one and zero, but setting them as a categorical variable means that the geopandas plot
method uses a different kind of coloring scheme that matches the data more appropriately.
'highway_buffer', 'geometry']].explore("highway_buffer", legend=True, tiles='CartoDB Positron') sea_ejscreen[[