18  Single and Comparative Inference

In many contexts, researchers are interested in whether some measured level of segregation is statistically different from a random process. That is, is the level of segregation we observe in place X greater than we would expect if there were no segregation at all? Sometimes that is a useful test, and the segregation package provides computational methods for conducting such a test. But it is also important to remember that the result of our inference is based on the construction of the question.

More plainly, it’s really easy to reject a silly null hypothesis; it is important that a counterfactual be plausible. In the context of residential segregation, Massey (1978) reminds us that location choice is not a random process, and that most places will surely be ‘statistically’ segregated when examined aginst a null hypothesis of ‘perfect integration’, or ‘evenness’, or ‘complete spatial randomness’ (depending on your disciplinary training). Unfortunately, residential segregation by race is an empirical reality in the U.S. (though it does not necessarily need to be (Ellen, 2023)), and cities will tend to be segregated through a combination of individual choice and structural constraints (Charles, 2003; Schelling, 1971). Ideally, we want to live in a world where we do not take segregation for granted (and in the education sphere, for example, there are reasons to believe the null hypothesis of zero segregation at the classroom or school level would be appropriate), so there is good reason for single-value inferential methods to exist in theory. But this is also a good reminder that our analyses should be based on a realistic understanding of the world, and in the U.S., Massey reminds us that we should adopt what I would call a ‘cynical prior’, when we evaluate residential segregation as a statistical phenomenon.

In the context of comparative inference, our null hypotheses become plausible in a much wider variety of circumstances. Is New York City less segregated today than it was 20 years ago? Is New York more segregated than Los Angeles? In these cases we are not asking whether there is a behavioral proclivity to segregate (which is not a very informative question), but whether there is a stronger sorting pattern detectable in one time period or one place versus another. These are much more plausible questions (and more interesting) than “is this place segregated or not?” but they also require a more thoughtful approach to constructing a null hypothesis of “no difference.”

The segregation package provides a framework for examining whether segregation index values are statistically significant (whether a single index is far enough away from “no segregation” that it could not happen by chance, or whether two indices are different enough from one another). This framework is useful for understanding, for example:

Depending on the segregation index being examined and the assumptions of the researcher, a variety of estimation techniques are available. This chapter walks through the assumptions and outcomes of each.

18.1 Single Value Inference

To carry out a statistical hypothesis test, we first need to define a null hypothesis represnting our expectation under the scenario of “no segregation”. In the realm of spatial statistics, this is often referred to as ‘complete spatial randomness’, which denotes no meaningful relationship between a unit of analysis and its neighbors. If we consider segregation as a phenomenon at that occurs at the person-level or household-level, then a CSR process is equivalent to the notion of ‘evenness’ in the segregation literature

For conducting single-value inference, the segregation package offer several techniques for generating random population distributions that respect the characteristics of an input dataset. This notebook walks through the assumptions and outputs of each approach

Code
import pandas as pd
import geopandas as gpd
import numpy as np
from geosnap import DataStore
from geosnap import io as gio
from geosnap import visualize as gvz
from geosnap import DataStore
from segregation.singlegroup import Gini, Dissim, RelativeConcentration
from segregation.multigroup import MultiInfoTheory
from segregation.inference import simulate_evenness, simulate_person_permutation, simulate_systematic_randomization, simulate_null, SingleValueTest, TwoValueTest
import matplotlib.pyplot as plt

%load_ext watermark
%watermark -a 'eli knaap' -iv 

datasets = DataStore()
dc = gio.get_census(datasets, msa_fips="47900", years=[2010])
dc = dc.to_crs(dc.estimate_utm_crs())
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: eli knaap

numpy      : 1.26.4
matplotlib : 3.10.0
segregation: 2.5.2.dev4+gea53e4c.d20250128
geopandas  : 1.0.1
pandas     : 2.2.3
geosnap    : 0.14.1.dev14+g0443e2a.d20250103
Code
dc.n_nonhisp_black_persons.sum()
1426445.0
Code
dc.n_total_pop.sum()
5671300.0
Code
dc.n_nonhisp_black_persons.sum() / dc.n_total_pop.sum()
0.2515199337012678
Code
gini = Gini(dc, group_pop_var='n_nonhisp_black_persons', total_pop_var='n_total_pop')

18.1.1 Complete Spatial Randomness (CSR) for Population Sorting

To get a feel for the substance of Massey’s critique, and to work out the mechanics of single-value inference, it is useful to understand what a region would look like under the condition of purely random population sorting.

  • evenness includes group-level variation
  • systematic includes unit-level variation
  • individual permutation includes only location variation

18.1.1.1 Evenness

Evenness takes draws from the population of each unit, with the probability of choosing group X equal to its regional share (locations drawing from distributions of population groups)

Code
# the regional share of nonhispanic black people in the DC region is ~25%
dc.n_nonhisp_black_persons.sum() / dc.n_total_pop.sum()
0.2515199337012678
Code
dc[['n_total_pop']].reset_index(drop=True).head()
n_total_pop
0 6426.0
1 2076.0
2 3262.0
3 4472.0
4 5164.0

Taking 6426 draws for the tract 0 (a draw for each person), on each draw there’s a 25% chance that the chosen person is black

Code
evenness = simulate_evenness(dc, group="n_nonhisp_black_persons", total="n_total_pop")
# calculate percent
evenness["pnhb"] = evenness["n_nonhisp_black_persons"] / evenness["n_total_pop"]
evenness
n_nonhisp_black_persons n_total_pop geometry pnhb
0 1617 6426 POLYGON ((325727.127 4312553.019, 325904.657 4... 0.251634
1 488 2076 POLYGON ((329869.083 4307078.613, 329855.54 43... 0.235067
2 863 3262 POLYGON ((323597.46 4315959.397, 323899.586 43... 0.264562
3 1101 4472 POLYGON ((320037.099 4312414.48, 320698.581 43... 0.246199
4 1324 5164 POLYGON ((319872.327 4312900.831, 319977.124 4... 0.256390
... ... ... ... ...
1355 1034 4003 POLYGON ((336209.349 4328432.396, 336301.465 4... 0.258306
1356 390 1458 POLYGON ((317434.028 4320902.735, 317696.001 4... 0.267490
1357 1349 5559 POLYGON ((323897.744 4330418.354, 324124.997 4... 0.242670
1358 966 3958 POLYGON ((306054.298 4367605.376, 306101.675 4... 0.244063
1359 591 2402 POLYGON ((288163.792 4344458.12, 288401.686 43... 0.246045

1360 rows × 4 columns

A formal test of evenness would ask whether the map on the left is sisgnificantly different from the right.

Code
f, ax = plt.subplots(1, 2, figsize=(9, 4))

dc.plot("p_nonhisp_black_persons", scheme="quantiles", ax=ax[0])
evenness.plot("pnhb", scheme="quantiles", ax=ax[1])

ax[0].set_title("Observed non-Hisp Black Population")
ax[1].set_title("Counterfactual non-Hisp Black Population")
Text(0.5, 1.0, 'Counterfactual non-Hisp Black Population')

Code
evenness.n_nonhisp_black_persons.sum()
1427934
Code
evenness.n_nonhisp_black_persons.sum() == dc.n_nonhisp_black_persons.sum()
False
Code
evenness.n_nonhisp_black_persons.sum() / evenness.n_total_pop.sum()
0.2517824837338882
Code
dc.n_nonhisp_black_persons.sum() / dc.n_total_pop.sum()
0.2515199337012678
Code
evenness.n_nonhisp_black_persons.sum() / evenness.n_total_pop.sum() == dc.n_nonhisp_black_persons.sum() / dc.n_total_pop.sum()
False
Code
evenness.n_total_pop.sum() == dc.n_total_pop.sum()
True
Code
evenness.n_total_pop.values == dc.n_total_pop.values
array([ True,  True,  True, ...,  True,  True,  True])

We haven’t changed total population in each unit or the region, but we have changed the number of group X in the region marginally

18.1.1.2 Systematic Randomization

The systematic approach takes draws from the regional population of each group, with the probability of choosing geographic unit X equal to the share of the region’s population that currently lives there (people drawing from a distribution of locations)

Code
dc.n_nonhisp_black_persons.sum()
1426445.0
Code
(dc.n_total_pop / dc.n_total_pop.sum()).reset_index(drop=True).head()
0    0.001133
1    0.000366
2    0.000575
3    0.000789
4    0.000911
Name: n_total_pop, dtype: float64

Out of 1426445 nonhispanic black people in the DC region there’s a 0.1133% chance that they will live in tract 0

Code
systematic = simulate_systematic_randomization(dc, group='n_nonhisp_black_persons', total='n_total_pop')
systematic
index geometry n_nonhisp_black_persons other_group_pop n_total_pop
0 0 POLYGON ((325727.127 4312553.019, 325904.657 4... 1566 4890 6456
1 1 POLYGON ((329869.083 4307078.613, 329855.54 43... 576 1530 2106
2 2 POLYGON ((323597.46 4315959.397, 323899.586 43... 835 2456 3291
3 3 POLYGON ((320037.099 4312414.48, 320698.581 43... 1108 3281 4389
4 4 POLYGON ((319872.327 4312900.831, 319977.124 4... 1313 3802 5115
... ... ... ... ... ...
1355 1355 POLYGON ((336209.349 4328432.396, 336301.465 4... 977 3002 3979
1356 1356 POLYGON ((317434.028 4320902.735, 317696.001 4... 353 1035 1388
1357 1357 POLYGON ((323897.744 4330418.354, 324124.997 4... 1430 4213 5643
1358 1358 POLYGON ((306054.298 4367605.376, 306101.675 4... 972 2995 3967
1359 1359 POLYGON ((288163.792 4344458.12, 288401.686 43... 592 1797 2389

1360 rows × 5 columns

Code
systematic.n_nonhisp_black_persons.sum()
1426445
Code
systematic.n_nonhisp_black_persons.sum() == dc.n_nonhisp_black_persons.sum()
True
Code
systematic.n_nonhisp_black_persons.sum() / systematic.n_total_pop.sum() == dc.n_nonhisp_black_persons.sum() / dc.n_total_pop.sum()
True
Code
systematic.n_total_pop.values == dc.n_total_pop.values
array([False, False, False, ..., False, False, False])

We haven’t changed the total number of people in each group, but we have changed the total number of people in each unit

18.1.1.3 Individual-level Permutation

Individual-level permutation doesn’t take draws from a probability distribution, but instead randomizes which unit each person lives in

Code
permutation = simulate_person_permutation(dc,group='n_nonhisp_black_persons', total='n_total_pop' )

permutation.n_nonhisp_black_persons.sum()
1426445.0
Code
permutation.n_nonhisp_black_persons.sum() == dc.n_nonhisp_black_persons.sum()
True
Code
permutation.n_nonhisp_black_persons.sum() / permutation.n_total_pop.sum() == dc.n_nonhisp_black_persons.sum() / dc.n_total_pop.sum()
True
Code
permutation.n_total_pop.values == dc.n_total_pop.values
array([ True,  True,  True, ...,  True,  True,  True])

We haven’t changed the total number of people in any group, or changed the total population in any unit, we’ve only randomized which unit each person lives in

18.1.2 Single-Value Null Distributions

The simulate_null generates a series of simulated segregation statistics (in parallel) using the randomization functions described above. Following, those simulated values can serve as a reference distribution to test the hypothesis of “no segregation”

Code
groups = ['n_nonhisp_black_persons',
 'n_nonhisp_white_persons',
 'n_asian_persons',
 'n_hispanic_persons']

G = Gini(dc, group_pop_var='n_nonhisp_black_persons', total_pop_var='n_total_pop')

H = MultiInfoTheory(dc, groups=groups)
Code
G.statistic
0.7343348331433938
Code
H.statistic
0.27203005459792423

18.1.2.1 Single Group

Code
G_even = simulate_null(seg_class=G, sim_func=simulate_evenness)

G_systematic = simulate_null(seg_class=G, sim_func=simulate_systematic_randomization)

G_permuted = simulate_null(seg_class=G, sim_func=simulate_person_permutation)

fig, ax = plt.subplots(figsize=(8,8))
G_permuted.name='permuted'
G_permuted.plot(kind='kde', ax=ax, legend=True)
G_systematic.name='systematic'
G_systematic.plot(kind='kde', ax=ax, legend=True)
G_even.name='evenness'
G_even.plot(kind='kde', ax=ax, legend=True)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(

18.1.2.2 Multi Group

/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(
Code
fig, ax = plt.subplots(figsize=(8,8))
H_permuted.name='permuted'
H_permuted.plot(kind='kde', ax=ax, legend=True)
H_systematic.name='systematic'
H_systematic.plot(kind='kde', ax=ax, legend=True)
H_even.name='evenness'
H_even.plot(kind='kde', ax=ax, legend=True)

Despite their different methods, all three approaches simulate similar distributions, but they differ with respect to how and in which dimensions the randomization occurs. As with Boisso et al. (1994), the distribution is not centered on zero (though it’s pretty close in the mutigroup case). In other cases, such as when minority populations are small or highly unbalanced among multiple groups, its possible that the different randomization methods could diverge to simulate different distributions

18.1.3 Inferential Methods

Code
# riverside MSA in 2010
rside = gio.get_census(datasets, years=[2010], msa_fips='40140')
Code
fig, ax = plt.subplots(figsize=(8,8))
rside.plot('p_hispanic_persons', scheme='quantiles', cmap='Blues', ax=ax)
ax.axis('off')
ax.set_title('% Hispanic/Latino in Riverside MSA')
Text(0.5, 1.0, '% Hispanic/Latino in Riverside MSA')

For single value inference, the segregation package tests whether the observed segregation index differs from the expected value of a segregation index under the null hypothesis of no segregation. As Boisso et al. (1994) show, the expected value of “no segregation” is not necessarily and index value of 0. The SingleValueTest class offers computational inference via a variety of methods for simulating observations under different randomization schemes

18.1.3.1 Evenness

Code
D_rside = Dissim(rside, group_pop_var='n_hispanic_persons', total_pop_var='n_total_pop')
D_rside.statistic
0.3703533440946156

The dissimilarity statistic for riverside 2010 is 0.370

The dissimilarity index is a measure of evenness, so it is reasonable to use the evenness null approach in the SingleValueTest class. For more information on the different randomization procedures used in the single value test, have a look at the 05_simulating_random_population notebook

Code
rside_test = SingleValueTest(D_rside, null_approach='evenness')

rside_test.p_value
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py:752: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(
0.0

The p-value for the test is essentially 0. The plot method of the class will show the simulated null distribution in blue as well as the observed value for the segregation statistic in red

Code
rside_test.plot()

Code
rside_test.est_sim.mean()
0.0108310497807968

The est_sim attribute on the SingleValueTest class contains the segregation index values calculated for the synthetic datasets. Here we can see the estimated Dissimilarity index under the assumption of perfect evenness in Riverside is 0.011

Here the plot shows that if Riverside’s population were perfectly even across geographic units, the unequal levels in population groups would still result in a small level of Dissimilarity segregation (just barely above 0). But the distribution is nonetheless tightly distributed around that barely-zero level, whereas observed Dissimilarity in Riverside is 0.37 so we reject the null of “no segregation” in favor of the hypothesis that the Hispanic/Latino population in Riverside is significantly segregated.

18.1.3.2 Bootstrap

As an alternative to simulating a null distribution, another resonable test for the Dissimilarity index is a bootstrap approach, used to simulate the distribution of the Dissimilarity index itself, then a given value for “no segregation” can be tested against this reference distribution. In practical terms, that means the bootstrapped index value can be tested against 0, or against the value given by a null distribution such as evenness above. This is the approach given by Allen et al. (2015)

Code
# standard test against D==0

rside_test_bootstrap = SingleValueTest(D_rside, null_approach='bootstrap')

rside_test_bootstrap.plot()

Plotting a SingleValueTest with the bootstrap method now shows the bootstrapped distribution of the segregation index versus the point estimate of no segregation (rather than the point estimate of the segregation index versus the simulated null distribution in the evenness approach above)

Code
# test against the value 0.010819964855634098 estimated above

rside_test_bootstrap2 = SingleValueTest(D_rside, null_approach='bootstrap', null_value=rside_test.est_sim.mean())

rside_test_bootstrap2.plot()

Whether we test against 0 or the simulated value from evenness, our inference is the same: we reject the null; Riverside is clearly segregated according to this test

18.1.3.3 Random Geographic Permutation

Alternatively, we might have examined a different segregation index, such as the Relative Concentration index, a spatial measure for which a different test would be appropriate

The random geographic permutation test shuffles the values of tracts in space to create a spatially-random distribution. This test leaves the total population in each group of each geographic unit, but randomizes where the unit exists in space

Code
# for spatial analysis, we need to sure rside data is in the correct projection
rside = rside.to_crs(rside.estimate_utm_crs())
Code
RCO_rside = RelativeConcentration(rside, group_pop_var='n_hispanic_persons', total_pop_var='n_total_pop')

RCO_rside.statistic
0.4795051531831221
Code
rside_test_permutation = SingleValueTest(RCO_rside, null_approach='geographic_permutation')

rside_test_permutation.p_value
0.02
Code
rside_test_permutation.plot()

18.1.3.4 Evenness Geographic Permutation

It is also possible to combine the two previous approaches to first generate a simulated population under the assumption of evenness, then geographically permute the simulated data

Code
rside_test_evenpermutation = SingleValueTest(RCO_rside, null_approach='even_permutation')

rside_test_evenpermutation.plot()

Here the inference becomes even stronger that we reject the null

18.2 Comparative Inference

Comparative inference is particularly useful in studying residential segregation because it facilitates both temporal and spatial comparisons, allowing researchers to ask whether one place is more segregated than another, or whether a given place has become more/less segregated over time.

As with single-value inference, the TwoValueTest class offers several techniques for conducting the analysis

Code
la = gio.get_census(datasets, years=[2010], msa_fips='31080')
la90 = gio.get_census(datasets, years=[1990], msa_fips='31080')
p = gpd.GeoDataFrame(pd.concat([la, la90]))
gvz.plot_timeseries(p, 'p_hispanic_persons', scheme='quantiles', cmap='Blues', nrows=1,ncols=2)
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/visualize/mapping.py:170: UserWarning: `proplot` is not installed.  Falling back to matplotlib
  warn("`proplot` is not installed.  Falling back to matplotlib")
array([<Axes: title={'center': '1990'}>, <Axes: title={'center': '2010'}>],
      dtype=object)

Code
D_la = Dissim(la, group_pop_var='n_hispanic_persons', total_pop_var='n_total_pop')
D_la.statistic
0.5180472134918201

The dissimilarity statistic for LA 2010 is 0.518

Code
D_la90 = Dissim(la90, group_pop_var="n_hispanic_persons", total_pop_var="n_total_pop")
D_la90.statistic
0.507180915697462

The dissimilarity statistic for LA 1990 is 0.507

So in 20 years, Hispanic/Latino segregation in LA has increased from 0.507 to 0.518. Is that increase singnificantly different than might occur at random?

Again, there are several methods available for testing differences between segregation values. Random labelling, based on Rey & Sastré-Gutiérrez (2010) creates a set of synthetic observations by shuffling geographic units between the two cities, calculating segregation statistics on these synthetic datasets, then taking the difference between the statistics. This process results in a distribution of differences (under the null that there is no difference between the cities), and we test the observed difference against this distribution

18.2.1 Random Labeling

18.2.1.1 LA over time

Code
la_test_label = TwoValueTest(D_la, D_la90, null_approach='random_label')
la_test_label.p_value
0.232

There’s a 21.2% chance of obtaining these results at random, so we fail to reject the null hypothesis that there is no difference in segregation between the two time periods

Code
la_test_label.plot()

Plotting the class shows the distribution of simulated differences in blue as well as the estimated difference in red. Here it is clear that the observed difference falls well within the distribution

18.2.1.2 LA vs Riverside

Code
la_rside_test_label = TwoValueTest(D_rside, D_la, null_approach='random_label')

la_rside_test_label.plot()

The results show that LA is significantly more segregated than Riverside, but LA is not significantly more segregated in 2010 than it was in 1990

18.2.2 Bootstrap

The bootstrap test based on Davidson (2009) uses the bootstrap resampling technique to estimate a distribution of the segregation index for each city, which provides an estimate of each index’s variance. Following, we perform a means test to see whether the mean of each distribution is significantly different from one another

18.2.2.1 LA over time

Code
la_test_bootstrap = TwoValueTest(D_la, D_la90, null_approach='bootstrap')

la_test_bootstrap.p_value
0.20536236178667372
Code
la_test_bootstrap.plot()

Plotting a TwoValueTest class with the bootstrap method shows the bootstrapped distributions for both segregation indices. Here we can clearly see that the distributions overlap substantially

18.2.2.2 LA vs Riverside

Code
la_rside_test_bootstrap = TwoValueTest(D_rside, D_la, null_approach='bootstrap')

la_rside_test_bootstrap.p_value
2.390599572912539e-37

The test is highly significant

Code
la_rside_test_bootstrap.plot()

Plotting the class shows the wide berth between distributions, explaining why the p-value is so low.

Again, the results show a significant difference between Riverside and LA, but not LA over time