PySAL Turns 2.2


tl; dr:

  • We released PySAL 2.2 into the wild with a brand new (but backwards compatible) structure and it’s pretty great

One [meta]package to Rule Them All

A year ago, we released PySAL 2.0, which was a major refactor for the library that broke it apart from a single monolithic package into several smaller, more focused subpackages. That decision gave us some much-needed flexibility and allowed us to focus development on new features without other parts of the library slowing down new releases. After the 2.0 release, you could download any of PySAL’s subpackages independently, but you could also grab the entire library with conda install pysal . From the user’s perspective, though, this created some confusion, since it was possible to use PySAL’s functionality from either the subpackages or the large metapackage. Worse, the documentation was in two places and things could get out of sync, where new features were available in subpackages but not yet integrated into the metapackage. The road to a larger, more modular package has a few bumps, but we have some ideas for helping smooth it over.

As of release 2.2, we’ve cleaned that up. Now, the pysal package acts as a convenient container for installing all the packages in the ecosystem. If you have legacy code that interacts with pysal, thats fine! It will still work. But now when you install pysal you are also guaranteed to get all the subpackages. Instead of duplicating code from across the ecosystem, pysal specifies each of the subpackages as a dependency. In plain terms, that means that the PySAL metapackage probably works more like people expect: when you install pysal, it also installs all of pysal ’s subpackages, so you can use them from the monolithic metapackage, or a la carte. That means if you do

import segregation

or

from pysal.explore import segregation

you’re working with the exact same package. If you import the package from pysal, it’s doing nothing more than silently importing the segregation package internally

But What does PySAL Do?

Last month I was chatting about spatial data science at a conference with some colleagues. Most of them are R users, so I got a few questions about what was new in pysal-land. One person asked “… what does pysal actually do?” I wasn’t expecting the question, so I got excited. But the further I got into my answer, the more I felt like Jim Carrey.

PySAL has grown to do so much that it’s difficult to encapsulate in few words. That means it’s well past time to provide a more structured overview of its functionality, if for no other reason than to give myself more of a roadmap next time I’m in the position to explain it again. If you want the decade-long background, you’re in luck! we just wrote a paper! But a quick overview of the library and what it’s for is also in order, so here goes1.

The short version is: PySAL, the Python spatial analysis library, is a Python package for spatial data science. It supports the development of high-level applications for spatial analysis, such as

  • detection of spatial clusters, hot-spots, and outliers
  • construction of graphs from spatial data
  • spatial regression and statistical modeling on geographically embedded networks
  • spatial econometrics
  • exploratory spatio-temporal data analysis

The longer version is: PySAL is a family of packages (currently 16) divided into four major components: lib , model , explore , and viz . It started over a decade ago as a collaboration between Serge Rey and Luc Anselin, consolidating their work on spatial econometrics and space-time dynamics. In those days, the Python data ecosystem was nascent (and the spatial data ecosystem was nonexistent). So the early PySAL team started by building fundamental data structures for spatial analysis and econometrics on top of numpy , including things like shapefile readers, classes for building spatial weights, functions for calculating measures like Moran’s I or estimating spatial autoregressive models, and tools for plotting spatial data.

Fast forward 10 years and the Python data landscape has changed substantially. PySAL has too. With things like pandas , geopandas , scikit-learn , statsmodels and a rapidly growing interactive visualization ecosystem2, we can offload some of the nuts and bolts to our friends and instead focus on new spatial analytics, like a new statistical measure of fit for spatially-constrained cluster models, or a computational inference framework for comparative segregation analysis. To accommodate the growing variety of spatial analytics PySAL supports, it’s provided as a family of packages organized around an ad-hoc structure.

Lib

The lib layer provides tools to solve a wide variety of computational geometry problems including graph construction from polygonal lattices, lines, and points, construction and interactive editing of spatial weights matrices & graphs - computation of alpha shapes, spatial indices, and spatial-topological relationships, and reading and writing of sparse graph data, as well as pure python readers of spatial vector data. Unike other PySAL layers, these functions are exposed together as a single package.

  • libpysal : provides foundational algorithms and data structures that support the rest of the library. This includes the following modules:

    • input/output ( io ), which provides readers and writers for common geospatial file formats;
    • weights ( weights ), which provides the main class to store spatial weights matrices, as well as several utilities to manipulate and operate on them;
    • computational geometry ( cg ), with several algorithms, such as Voronoi tessellations or alpha shapes that efficiently process geometric shapes;
    • example data sets ( examples ).

Explore

The explore layer includes packages for exploratory analysis of spatial and spatio-temporal data. These packages focus on revealing and interrogating patterns in the data and suggesting new interesting questions rather than answering existing ones. There are also methods for examining the dynamics of these distributions, such as how their composition or spatial extent changes over time.

  • esda : exploratory spatial data analysis and inference for global and local spatial autocorrelation
  • giddy : space-time analysis of distribution dynamics
  • inequality : spatiotemporal inequality analysis
  • pointpats : statistical point pattern analysis
  • segregation : single-value and comparative segregation measurement, decomposition, and inference
  • spaghetti : spatial analysis of graphs, networks, topology, and inference.

Model

The model layer focuses on confirmatory analysis. In particular, its packages focus on the estimation of spatial relationships in data with a variety of linear, generalized-linear, generalized-additive, nonlinear, multi-level, and local regression models.

  • mgwr : single- and multi-scale geographically-weighted regression modeling
  • spglm : sparse matrix generalized linear regression modeling
  • spint : gravity-type spatial interaction modeling
  • spreg : spatial econometric modeling
  • spvcm : Bayesian spatial multilevel modeling
  • tobler : areal interpolation and dasymetric mapping

Viz

The viz layer provides functionality to support the creation of geovisualisations and visual representations of outputs from a variety of spatial analyses. Visualization plays a central role in modern spatial/geographic data science. Current packages provide classification methods for choropleth mapping and a common API for linking PySAL outputs to visualization tool-kits in the Python ecosystem.

  • legendgram : legends that visualize the distribution of observations by color in a given map
  • mapclassify : Choropleth map classification algorithms
  • splot : statistical visualization for spatial analysis

A Loose Translation

If you’re a spatial person coming from R, a conversion from PySAL to R might look a bit like the following3:

PySALR
geopandas4sp / sf
libpysal / esdaspdep
pointpatsspatstat
inequalityineq
segregationOasisR5
giddyspMarkov / spMC
spaghettitidygraph
mgwrspgwr6
spglmglm
spintspatialPosition
spregspatialreg / splm
spvcmNA
toblerareal

Spatial Data Science Gestalt

Prior to version 2.2, we combined all these packages into a single codebase distributed as the monolithic PySAL package, since spatial data science workflows necessarily include the use of several packages in tandem. But since each of the subpackages was available on its own too, people often got confused. Now when you conda install pysal , you get them all. Most analyses begin with some exploration, first by generating a few descriptive plots using mapclassify and splot (and the affiliated package contextily ), before examining spatial relationships in more detail (using libpysal to create spatial weights and esda to analyze spatial autocorrelation). After an analyst has a more thorough understanding of the data, she might move on to build spatial autoregressive models with spreg or geographically-weighted regression models with mgwr , or conduct some regional comparative analyses with inequality or segregation .

Alternatively, a researcher might be interested in spatio-temporal analysis, and begin by collecting some data from the census using the PySAL affiliated package cenpy , before using tobler to convert census geographies into time-static units. With this new dataset in hand, the analyst can proceed to examine how a region evolves through space and time using giddy . These kinds of workflows are common across a vast range of social and natural sciences and we hope to keep building out the PySAL family to make it easier to integrate them alongside the rest of the pydata ecosystem.

For more information on the library, check out the paper linked above or the documentation for each of the individual packages. We’re still pulling together a full suite of tutorials that will be available as the notebooks project. But until then you can check out some of the workshop materials we’ve put together (including the latest) or Serge, Dani and Levi’s fantastic forthcoming book.




  1. most of this is lightly repurposed from the paper :) ↩︎

  2. see bokeh, panel, hvplot, altair, folium, ipyleaflet, etc. ↩︎

  3. I’m not terribly picky on the capitalization or distinction between package/function here. If I got it wrong, my apologies. The goal is to provide a loose translation of functionality, not a 1:1 mapping. ↩︎

  4. geopandas isn’t part of pysal, but it’s the central infrastructure for spatial analysis in Python. ↩︎

  5. OasisR (which is fantastic) is the closest to matching segregation’s feature set, but PySAL’s segregation also includes a decomposition framework, additional computational inference methods, and street network-based and multiscalar methods. See here for an example. ↩︎

  6. as far as I know, unlike PySAL’s gwr, spgwr does not implement multiscalar geographically weighted regression. ↩︎