21  Urban Factor Ecology

Code
import contextily as ctx
import geopandas as gpd
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
import seaborn as sns
from esda import Moran, Moran_Local
from factor_analyzer import FactorAnalyzer
from geosnap import DataStore
from geosnap import io as gio
from libpysal.weights import Rook
from networkx.drawing.nx_agraph import graphviz_layout
from scipy.stats import zscore
from splot.esda import lisa_cluster, plot_local_autocorrelation

%load_ext watermark
%watermark -a 'eli knaap' -iv -u -d
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: eli knaap

Last updated: 2025-02-04

scipy          : 1.14.1
seaborn        : 0.13.2
networkx       : 3.4.2
factor_analyzer: 0.5.1
esda           : 2.6.0
contextily     : 1.6.2
geosnap        : 0.14.1.dev14+g0443e2a.d20250103
pandas         : 2.2.3
libpysal       : 4.12.1
splot          : 1.1.7
matplotlib     : 3.10.0
geopandas      : 1.0.1

Combining the scholarship on spatial structure and segregation, urban scholars have long recognized the value of studying multi-dimensional segregation, i.e. how cities partition into smaller communities along the lines of race, ethnicity, socioeconomic status, family structure, etc. Work in this tradition develops and applies methods to identify patterns like Burgess’s concentric zones model, and asks (a) which “dimensions” are most important for understanding residential sorting, and (b) whether cities in different regions, countries, or political/economic systems follow similar patterns.

Before the Geography of Opportunity (Galster & Killen, 1995), there was the Ecology of Inequality (Massey & Eggers, 1990), and prior to that, a massive literature on ‘factor ecology’ which adopts exploratory factor analysis to analyze residential differentiation. An important vestige of this tradition is the focus on neighborhood differentiation as an outcome of social processes. Thus, the goal is to collect neighborhood data and conduct factor analysis to understand which neighborhood variables seem to measure the same “underlying construct”. That is, while the ultimate result is a dimensionality-reduction technique, the ‘factors’ uncovered by the researchers are viewed as [partial] measurements of different segregating processes.

In this section we explore comparative factor ecology (and extend it) using the canonical examples of Los Angeles and Chicago. These two cities (metropolitan regions, actually) serve as a useful comparison because they are two large and well known cities, but also because the conceptual research design and empirical examples were developed in these places (Shevky & Williams, 1949). Further, some have argued that L.A. and Chicago represent two distinct traditions of urban research, both focused on community scholarship albeit with different lineage (Dear, 2002).

21.1 Social Area Analysis and Factorial Ecology

Much attention is now being given to the construction of models of urban structure of a type which are statistical and computerised and which are contributing the first wave of generalised urban models since those produced by the American school of human ecology in the 1920’s and 1930’s. This new interest in model-making is however, paralleled by a series of attempts at formulating new methods of analysis for the study of urban structure. These again have their analogies in the interwar Chicago school, where the production of the general ecological models was preceded by extensive social investigation and the mapping of social data.

Herbert (1967)

The original idea concept behind social area analysis and factorial ecology is to summarize urban data along its primary axes, then classify areas according to these axes. Although both SAA and FE drew considerable criticism for being atheoretical, the fomalization of the method was intended directly to address several hypotheses about social and spati al structure (Arsdol et al., 1958; Bell, 1955; Bell & Greer, 1962; Schmid et al., 1958; Van Arsdol et al., 1961, 1962). This predates the inception of confirmatory factor analysis, so the hypothesis testing was less stringent, but the hypotheses were explicit nonetheless.

The first testable hypothesis is that American cities divide themselves along three principal axes related to economic status, family status, and ethnic status, which together provide the foundation for location choice and multidimensional segregation (Bell, 1955). The second set of hypotheses focus on the relationship between the revealed dimensions and social behaviors for populations in different areas. This is an early forerunner to neighborhood effects research (Green, 1971; Greer, 1960; Johnston et al., 2004).

“Shevky, Williams, and Bell [301, 3022] argued that most of the social differentiation and stratification of the population in the United States can be summarized in three primary social”dimensions”:

  • an index of social rank measuring socioeconomic status,
  • an index of urbanization measuring family status,
  • and an index of segregation measuring ethnic status”

Salins (1971)] (bullets added)

And as decribed above, these measures of social differentiation were viewed as outcomes of unobservable social processes (i.e. segregation by age and family size). Following, scholars used these variables to test other hypotheses, such as whether having a larger family resulted in different community-level behaviors like voting turnout or civic participation.

“The dimensions are social rank, segregation, and urbanization. The last largely measures differences in family structure, and, it is assumed, indicates corollary differences in behavior. Thus, when social rank and segregation are controlled, differences in the index of urbanization for specific tract populations should indicate consistent variations in social behavior. One purpose of the present research was to determine the nature of such corollary differences, and particularly differences in social participation.” (Greer, 1956, p. 19)

Factorial ecology and social area analysis endured a great deal of criticism before being essentially abandoned by the 1990s, however these two hypotheses–especially the first–are probably among the most replicated findings in empirical urban research.

“accepting the systemic assumption, factorial ecology asks the question”how does the system cohere and pattern?” The answer is sought by trying to identify repetitive sequences of spatial variation present in many observable attributes of area”

Berry (1971)

for an overview of the method, see Rees (1971)

Code
datasets = DataStore()

la = gio.get_acs(datasets, msa_fips="31080", years=[2021])
la = la[la.n_total_pop > 0]
la = la.set_index("geoid")
la = la.to_crs(la.estimate_utm_crs())

chi = gio.get_acs(datasets, msa_fips="16980", years=[2021])
chi = chi[chi.n_total_pop > 0]
chi = chi.set_index("geoid")
chi = chi.to_crs(chi.estimate_utm_crs())

chi["pop_density"] = chi.n_total_pop / chi.area
la["pop_density"] = la.n_total_pop / la.area

cols = la.columns[~la.columns.str.startswith("n_")].tolist()
cols.remove('geometry')

chi.p_housing_units_multiunit_structures = (
    chi.p_housing_units_multiunit_structures.fillna(0)
)
la.p_housing_units_multiunit_structures = (
    la.p_housing_units_multiunit_structures.fillna(0)
)


chi[cols] = chi[cols].fillna(chi[cols].median())
la[cols] = la[cols].fillna(la[cols].median())

chi_corr = chi[cols].corr(numeric_only=True).dropna(how="all").drop(columns=["year"])
la_corr = la[cols].corr(numeric_only=True).dropna(how="all").drop(columns=["year"])
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
  warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:217: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
  warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
  warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:217: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
  warn(
Code
sns.clustermap(
    chi_corr,
    cmap="RdBu_r",
    annot=True,
    fmt=".2f",
    figsize=(10, 10),
    annot_kws={"size": 6},
)
plt.suptitle("Correlation Structure in Chicago Region", fontsize=20)
# plt.tight_layout()
Text(0.5, 0.98, 'Correlation Structure in Chicago Region')

Code
sns.clustermap(
    la_corr,
    cmap="RdBu_r",
    annot=True,
    fmt=".2f",
    figsize=(10,10),
    annot_kws={"size": 6},
)
plt.suptitle("Correlation Structure in LA Region", fontsize=20)
Text(0.5, 0.98, 'Correlation Structure in LA Region')

These tell different stories. Whereas racial segregation is more important in Chicago, ethnic segregation is more obvious in LA

Code
# collinear
la_corr = la_corr.drop(columns=["p_asian_indian_persons", "p_vacant_housing_units"])
chi_corr = chi_corr.drop(columns=["p_asian_indian_persons", "p_vacant_housing_units"])

cols = chi_corr.columns

# create and fit factor analysis on z-standardized data
# using as many factors as there are variables
fa_la = FactorAnalyzer(rotation="oblimin", n_factors=la_corr.shape[1])
fa_chi = FactorAnalyzer(rotation="oblimin", n_factors=chi_corr.shape[1])
fa_la.fit(la[cols].apply(zscore))
fa_chi.fit(chi[cols].apply(zscore))

# collect the factor measures and store then as pandas Series
evla, _ = fa_la.get_eigenvalues()
evchi, _ = fa_chi.get_eigenvalues()
evla = pd.Series(evla)
evchi = pd.Series(evchi)

# scree plot for each region
f, ax = plt.subplots(1, 2, figsize=(9, 4))

evla.iloc[:10].plot(grid=True, style=".-", ax=ax[0])
ax[0].set_title("LA")

evchi.iloc[:10].plot(grid=True, style=".-", ax=ax[1])
ax[1].set_title("Chicago")
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
Text(0.5, 1.0, 'Chicago')

Clear elbow at 4 in LA, but 5 in Chicago. Since the original work focuses on 3 factors, we will fit 4 here in both cases

Code
# re-fit the four factor solution
fala = FactorAnalyzer(n_factors=4)
fala.fit(la[cols].apply(zscore).fillna(0))

fachi = FactorAnalyzer(n_factors=4)
fachi.fit(chi[cols].apply(zscore))

# create a dataframe of the factor loadings for each region 
factors_la = pd.DataFrame.from_records(
    fala.loadings_, index=la_corr.columns, columns=["F1", "F2", "F3", "F4"]
)
factors_chi = pd.DataFrame.from_records(
    fachi.loadings_, index=cols, columns=["F1", "F2", "F3", "F4"]
)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(

loadings less than .1 are considered unimportant (R and others suppress them). Revelle says ignore less than .3.

We will depart a bit from the standard conventions and fit a model with 4 factors rather than 3

Code
factors_la = factors_la.mask(abs(factors_la) < 0.3)
factors_chi = factors_chi.mask(abs(factors_chi) < 0.3)

factors_la.dropna(how="all")
F1 F2 F3 F4
median_home_value 0.712581 NaN NaN NaN
median_contract_rent 0.545823 NaN NaN NaN
median_household_income 0.576181 0.467821 NaN NaN
per_capita_income 0.843196 NaN NaN NaN
p_owner_occupied_units -0.379125 0.303193 NaN NaN
p_housing_units_multiunit_structures NaN -0.917374 NaN NaN
p_persons_under_18 -0.340608 NaN -0.410277 NaN
p_persons_over_60 NaN 0.303233 0.871076 NaN
p_persons_over_75 NaN NaN 0.742417 NaN
p_married NaN 0.665731 NaN NaN
p_widowed_divorced NaN NaN 0.553001 NaN
p_nonhisp_white_persons 0.828199 NaN NaN NaN
p_hispanic_persons -0.840664 NaN NaN NaN
p_asian_persons NaN NaN NaN 0.966477
p_edu_hs_less -0.779623 NaN NaN NaN
p_edu_college_greater 0.981352 NaN NaN NaN
p_veterans NaN NaN 0.433790 NaN
pop_density NaN -0.513999 NaN NaN
Code
Gla = nx.from_pandas_edgelist(
    factors_la.T.stack().rename("weight").reset_index().round(3),
    source="level_0",
    target="level_1",
    edge_attr="weight",
    edge_key="weight",
    create_using=nx.DiGraph,
)

f, ax = plt.subplots(figsize=(8, 11))

pos = graphviz_layout(Gla, prog="dot", args='-Grankdir="LR"')
nx.draw_networkx(
    Gla,
    pos=pos,
    with_labels=True,
    ax=ax,
    edge_cmap=plt.cm.RdBu_r,
    edge_color=factors_la.T.stack().values,
    node_size=500,
    width=3,
    arrowsize=14,
)
labels = nx.get_edge_attributes(Gla, "weight")
nx.draw_networkx_edge_labels(Gla, pos, edge_labels=labels)
ax.margins(0.2, None)  # add some horizontal space to fit labels
ax.axis("off")
plt.suptitle("Factor Loadings in LA", fontsize=18)
plt.tight_layout()

Code
pd.DataFrame(
    fala.phi_,
    columns=[f"F{i+1}" for i in range(4)],
    index=[f"F{i+1}" for i in range(4)],
)
F1 F2 F3 F4
F1 1.000000 0.260805 0.380428 0.074312
F2 0.260805 1.000000 -0.036529 -0.003217
F3 0.380428 -0.036529 1.000000 0.084671
F4 0.074312 -0.003217 0.084671 1.000000
Code
factors_chi.dropna(how="all")
F1 F2 F3 F4
median_home_value 0.719764 NaN NaN NaN
median_contract_rent 0.505525 NaN NaN NaN
median_household_income 0.695099 NaN NaN NaN
per_capita_income 0.850860 NaN NaN NaN
p_owner_occupied_units NaN NaN 0.440959 NaN
p_housing_units_multiunit_structures NaN NaN NaN 0.840963
p_persons_under_18 NaN -0.575334 NaN -0.352355
p_persons_over_60 NaN 0.886817 NaN NaN
p_persons_over_75 NaN 0.791416 NaN NaN
p_married NaN NaN 0.563778 -0.478257
p_widowed_divorced NaN 0.495951 NaN NaN
p_female_headed_families NaN NaN -0.468703 NaN
p_nonhisp_white_persons 0.522263 NaN 0.327412 NaN
p_nonhisp_black_persons NaN NaN -0.890260 NaN
p_hispanic_persons -0.732092 NaN 0.608918 NaN
p_edu_hs_less -0.732998 NaN NaN NaN
p_edu_college_greater 0.958498 NaN NaN NaN
p_veterans NaN 0.407699 NaN NaN
pop_density NaN NaN NaN 0.630527
Code
Gchi = nx.from_pandas_edgelist(
    factors_chi.T.stack().rename("weight").reset_index().round(3),
    source="level_0",
    target="level_1",
    edge_attr="weight",
    edge_key="weight",
    create_using=nx.DiGraph,
)

f, ax = plt.subplots(figsize=(8, 11))

pos = graphviz_layout(Gchi, prog="dot", args='-Grankdir="LR"')
nx.draw_networkx(
    Gchi,
    pos=pos,
    with_labels=True,
    ax=ax,
    edge_cmap=plt.cm.RdBu_r,
    edge_color=factors_chi.T.stack().values,
    node_size=500,
    width=3,
    arrowsize=14,
)
labels = nx.get_edge_attributes(Gchi, "weight")
nx.draw_networkx_edge_labels(Gchi, pos, edge_labels=labels)
ax.margins(0.15, None)  # add some horizontal space to fit labels
ax.axis("off")
plt.suptitle("Factor Loadings in Chicago", fontsize=18)
plt.tight_layout()

Code
pd.DataFrame(
    fachi.phi_,
    columns=[f"F{i+1}" for i in range(4)],
    index=[f"F{i+1}" for i in range(4)],
)
F1 F2 F3 F4
F1 1.000000 0.122424 -0.172076 0.320474
F2 0.122424 1.000000 -0.215499 -0.260023
F3 -0.172076 -0.215499 1.000000 0.006572
F4 0.320474 -0.260023 0.006572 1.000000

This is not exactly the factor structure postulated by Shevky and Bell, but it is very similar to the large body of replication work that followed in the 60s and 70s. The factors are ordered slightly differently between the two regions, but follow the same general structure

  • the ‘social rank’ factor is dominated by income, education, and land value (rent/home value) to an extent.

  • the ‘family structure’ factor is dominated by variables related to age and marital status

  • the ‘urbanization’ factor seems to capture density and morphology

  • ‘segregation factor’ captures race in both cases. Interestingly, it loads exclusively on Asian population in LA

That last point is really important. In Los Angeles, racial inequality is so deeply intertwined with socioeconomic inequality (at the neighborhood level) that they cant be separated into distinct factors. More bluntly, when you are talking about a predominantly Black and or brown neighborhood, you are almost inevitably talking about a predominantly poor neighborhood. When we think about segregation and neighborhood effects, the idea that race and inequality are essentially synonymous in LA is sobering, to say the least. If we want to make a dent in inequality, then it means intentionally shooting for more integration… But the Chicago result, showing the converse, is also important. There, Black and Hispanic segregation load on a different factor than socioeconomic status, which is striking in a different way. While the separation of these factors means that there are a good number of high-SES minority neighborhoods (so ‘SES’ and ‘[racial] segregation’ measure distinct concepts), it also means that you cannot understand Chicago’s social geography without considering black and hispanic segregation (the race factor is more important in Chicago than LA, in that it explains a greater share of the covariance).

Put differently, the Black/Hispanic composition of the neighborhood is one of the most salient factors guiding location choice in Chicago. In LA, any preference for white neighborhoods is masked by a preference by high SES neighborhoods

These results are very similar to Anderson & Bean (1961), who find that the original Shevky/Bell urbanization factor is better split into two concepts, one demographic and one morphological: “Factor A is almost equivalent to the percent of dwellings in multiunit structures (loading .971)” (Anderson & Bean, 1961). As a generic categorization, these factors map fairly well onto the original SAA factors (as long as you have them in mind), though obviously the loading structure can be quite different across cities, even if the general factors are similar. For example what defines the segregation dimension in LA is the Asian population, whereas in Chicago the factor is defined by its share of Black and Hispanic/Latino residents. Put differently, if you were trying to define the most important social dimensions to define American cities, then race and ethnicity would certainly comprise one of the dimensions, but the particular makeup of the factor depend on the history and demography of the city under study

Code
chi_map = dict(
    F1="socioeconomic status",
    F2="family structure",
    F3="segregation",
    F4="urbanization",
)

la_map = dict(
    F1="socioeconomic status",
    F2="urbanization",
    F3="family structure",
    F4="segregation",
)

Unike the dimensions of segregation explored in the segregation chapter, these latent variables are more useful than their constituent parts. Thus we can use the factor model to estimate the latent variables for each observation to map or analyze them further

Code
chi_scores = pd.DataFrame(
    fachi.transform(chi[cols].dropna(subset=cols).apply(zscore).values),
    columns=list(chi_map.values()),
)

la_scores = pd.DataFrame(
    fala.transform(la[cols].dropna(subset=cols).apply(zscore).values),
    columns=list(la_map.values()),
)

for col in chi_scores.columns:
    chi[col] = chi_scores[col].values
for col in la_scores.columns:
    la[col] = la_scores[col].values
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
  warnings.warn(
Code
(
    f,
    ax,
) = plt.subplots(2, 2, figsize=(9, 12))
ax = ax.flatten()

for i, col in enumerate(chi_scores.columns):
    chi.plot(col, scheme="quantiles", cmap="RdBu_r", ax=ax[i], alpha=0.5)

    ax[i].axis("off")
    ax[i].set_title(col.title(), fontsize=16)
    ctx.add_basemap(ax[i], source=ctx.providers.CartoDB.Positron, crs=chi.crs)

plt.tight_layout()

Wow, the segregation dimension is captured far better than i would have imagined from the loadings alone. It makes more sense if you reverse the score’s sign in this case.

For the sake of illustration, I will plot the SES factor along with the location of the apartment I lived in when I was a visiting assistant professor at UIC–and just for context, we’re talking 2017; I was 29 years old, my salary was $35k, and my student loans were in repayment (after accumulating three degrees’ worth of compound interest at Great Recession rates…)

Code
m = chi[["socioeconomic status", "geometry"]].assign(geometry=chi.geometry.simplify(100)).explore(
    "socioeconomic status",
    scheme="quantiles",
    cmap="RdBu_r",
    style_kwds=dict(fill_opacity=0.4, weight=0.3),
    tiles="CartoDB Positron",
)

# this is my old place in chicago
gpd.tools.geocode("3658 armitage ave, chicago, il").explore(
    color="black",
    marker_kwds=dict(radius=12),
    style_kwds=dict(fill_opacity=1),
    m=m,
)
Make this Notebook Trusted to load map: File -> Trust Notebook

By just about any estimate, this map makes me a gentrifier. While my income was probably at or below the median for the neighborhood, I was younger and more educated than most of my neighbors, and a white guy living in a predominantly minority neighborhood. If you are not familiar with Chicago’s social geography, this is near the aready-gentrified Bucktown, Wicker Park, and Logan Square neighborhoods, all of which have more shopping and nightlife amenities (and closer to the L), but none of which I could afford.

When I would describe my neighborhood to colleagues, they would almost inevitably describe that region of the city as ‘block to block’ in terms of ‘desireability’. In uncoded terms, that part of the city is widely perceived as the gentrifying frontier, and if you zoom into the circle, it is easy to understand why: it sits at the gateway of high-high and low-low SES local clusters, and in a few hundred meters, incomes change very fast. This brings a lot of diverse groups into conact with one another in a small space. Some spaces get defended (Kadowaki, 2019; Suttles, 1972), others get invaded (London, 1980; Park et al., 1925), but this is the region where the dividing lines are being drawn actively.

Parenthetically, the diverse set of residents did seem strongly united in their political views and party affiliation

Lew with the local artwork in Logan Square

Lew1 with the local artwork in Logan Square

(Obviously, I didn’t deface the sidewalk myself, but I couldn’t resist the opportunity to document the local context in my new neighborhood at the time). Political graffiti, especially on publicly-owned property, is a clear indicator of a specific form of collective efficacy (Alvarado, 2016; Carbone & McMillin, 2019; Cohen et al., 2008; Feinberg & Sturm, 2019; Hipp, 2016; Sampson et al., 1999). I left the location metadata in the image for enterprising readers desperate to know where the photo was taken (though it’s only a few blocks from the geocoded point above).

Code
f, ax = plt.subplots(2, 2, figsize=(9, 12))
ax = ax.flatten()

for i, col in enumerate(la_scores.columns):
    la.plot(col, scheme="quantiles", cmap="RdBu_r", ax=ax[i], alpha=0.6)

    ax[i].axis("off")
    ax[i].set_title(col.title(), fontsize=16)
    ctx.add_basemap(ax[i], source=ctx.providers.CartoDB.Positron, crs=la.crs)

plt.tight_layout()

Code
la[["socioeconomic status", "geometry"]].assign(geometry=la.geometry.simplify(100)).explore(
    "socioeconomic status",
    scheme="quantiles",
    cmap="RdBu_r",
    style_kwds=dict(fill_opacity=0.4, weight=0.3),
    tiles="CartoDB Positron",
)
Make this Notebook Trusted to load map: File -> Trust Notebook

“Another charge is that all of our comparative studies are guilty of ethnocentrism and ideological bias, particularly in studies that presume the material and structural superiority of the complex industrial structures of industrial societies in general and Western democracies in particular. The presumption can be overtly stated, or implied by the transfer of conceptual and methodological schemes. Undoubtedly, this charge has much truth to it. Probably 90 percent of all cross-national research is initiated in the United States and has its dimensions of comparison defined in U. S. terms”

Berry (1971)

“to quote the author of an important recent textbook,”by far the major finding is that residential differentiation in the great majority of cities is dominated by a socio-economic dimension, with a second dimension characterised by family status/life cycle characteristics and a third dimension relating to segregation along ethnic divisions” (Knox1981,p.81). This is a common, but rather misleading, conclusion. Socio-economic status is generally the pre-eminent factor, which frequently accounts for over a third of the total variance. However, this reflects the spatial congruence of an important group of structural parameters - occupation, income, number of years schooling, employment status among them - rather than the degree to which any one parameter which is indicative of socio-economic status is correlated with area of residence. Other important structural parameters, such as racial and ethnic status, do not co-vary spatially and are consequently represented by separate, relatively unimportant factors (for example, Rees 1970)

Morgan (1984)

This is almost exactly what the results still show using 2021 ACS Data.

The conceptual argument for factorial ecology is that residential areas can be characterized by lots of different data points–hundreds of Census indicators if so desired–but after considering all these measures, the differentiation between neighborhoods can be characterized, almost entirely, by a small handful of representative dimensions. Destite some arguments over interpretation and application, dozens of replication studies agree with this basic premise, and as the scree plots above show, four or so factors capture nearly all the covariation in the blockgroup attributes. The empirical findings of factorial ecology are more or less uncontested.

The major critique of factor ecology is its inescapable rooting in human ecology, which for all of its contributions, views urban dynamics as ‘natural’ elements akin to biology. Obviously our understanding of structural inqeuality today regards that view as flippant, and that many of that spatial patterns we observe in cities today are the direct result of institutionalized racism and intentionally-designed public policies.

The kernel of genius apparent in factorial ecology, and our contemporary understanding of structural inequality are perfectly compatible, as long as we reorient the interpretation of the latent variables to represent “sorting factors” guided by political, economic and social forces, rather than outcomes from some natural ecological (or utility maximizing) process (Quillian, 2015). Class, race, age, and the characteristics of the built environment remain the primary ways that cities are organized, through a mixture of individual choices, market forces, and public policies.

21.2 Spatial Structure in Social Dimensions

In another classic example, Anderson & Egeland (1961) argue that the factorial dimensions identified above should demonstrate different spatial layouts.

“the results indicate clearly that Burgess’ concentric zone hypothesis is essentially supported with respect to urbanization but not with respect to social rank (or prestige value as this dimension is termed in this paper), while Hoyt’s sector hypothesis is supported with respect to social rank (prestige value) but not with respect to urbanization”

An important difference between today’s urban analytics and yesterday’s factor ecology is the ability to conduct formal tests of spatial structure. Techniques like the LISA were not developed until the 90s, by which time factor ecology had been all but abandoned. This yields a unique opportunity to examine spatial strucure in social structure. That is, do we find a strong spatial signal in the different social dimensions?

  • and is the signal the same across dimensions?
  • across places?
  • over time?

A compelling argument is given by Savitz & Raudenbush (2009), that spatial effects can be used to develop better estimates of the latent factors, but I am not aware of anyone who has examined spatial dependence in the resulting latent dimensions.

Code
w_chi = Rook.from_dataframe(chi)

ds = []
for col in chi_scores.columns:
    m = Moran(chi[col].values, w_chi)
    d = pd.Series({"I": m.I, "p-val": m.p_sim}, name=col)
    ds.append(d)
pd.DataFrame(ds)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_15892/4099440288.py:1: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w_chi = Rook.from_dataframe(chi)
I p-val
socioeconomic status 0.778226 0.001
family structure 0.261456 0.001
segregation 0.740742 0.001
urbanization 0.750752 0.001

In Chicago, all the dimensions show significant spatial autocorrelation, with SES, urbanization, and segregation all having Moran’s I values greater than 0.7–which is remarkably high. This is a unique finding… the exploratory factor analysis suggests a 4 factor fit (which loosely comports with the general factor ecology results), and the obliquely-rotated factors are essentially independent–but all have very strong spatial patterning. That suggests four social dimensions with four spatial signatures…

Code
f, ax = plt.subplots(2, 2, figsize=(9, 12))
ax = ax.flatten()

for i, col in enumerate(chi_scores.columns):
    mloc = Moran_Local(chi[col].values, w_chi)
    lisa_cluster(mloc, chi, ax=ax[i])
    ctx.add_basemap(ax[i], source=ctx.providers.CartoDB.Positron, crs=chi.crs)
    ax[i].set_title(col.replace("_", " ").title(), fontsize=16)
plt.tight_layout()

Chicago’s regional geography is much more interesting than its strictly-municipal georaphy, if you ask me–especially when you consider the traditional story of spatial structure in these places… Chicago is the definition of the monocentric city, but when we look for clusters of its socioeconomic status, a much larger polycentric structure emerges (especially the familiar north-south divide).

But the urbanization factor essentially recovers the rent gradient! (wow!). There are the concentric rings.

The family structure variable in Chicago is basically a map of where the young people live… Low-Low clusters are the young places–hip and trendy, but not very career-established, so not particularly high-income. The really hip and trendy places here ad the “High-Low” observations like wicker park and mount pleasant, where older established families (who can afford the relatively expensive places) live in the core of the cool neighborhood.

And the segregation variable is a similarly familiar depiction of the Black and Hispanic neighborhoods in the region, whose hypersegregation (especially in Chicago) has been long studied.

This is the appeal of factor ecology… intuitively, these factors ‘make sense’ as principle axes of the region’s social geography.

Code
miloc_chi = Moran_Local(chi['socioeconomic status'], w_chi)
miloc_chi.explore(chi.assign(geometry=chi.geometry.simplify(150)), tiles='CartoDB Positron')
Make this Notebook Trusted to load map: File -> Trust Notebook

This is not a picture of monocentrism:

And look at the distinct transition between zones moving westward from the lake along I-290!

By contrast, the infamously polycentric Los Angeles turns into an egg yolk, with nearly concentric rings telling a story about city vs suburbs

Code
w = Rook.from_dataframe(la)

ds = []
for col in la_scores.columns:
    m = Moran(la[col].values, w)
    d = pd.Series({"I": m.I, "p-val": m.p_sim}, name=col)
    ds.append(d)
pd.DataFrame(ds)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_15892/2765111550.py:1: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w = Rook.from_dataframe(la)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/contiguity.py:61: UserWarning: The weights matrix is not fully connected: 
 There are 4 disconnected components.
 There is 1 island with id: 5441.
  W.__init__(self, neighbors, ids=ids, **kw)
('WARNING: ', 5441, ' is an island (no neighbors)')
I p-val
socioeconomic status 0.844877 0.001
urbanization 0.550889 0.001
family structure 0.289145 0.001
segregation 0.439830 0.001

In LA, all the dimensions are significantly spatially patterned as well, though with a bit more nuance than Chicago. The Moran value for SES is dramatically high at .84

Code
f, ax = plt.subplots(2, 2, figsize=(9, 12))
ax = ax.flatten()

for i, col in enumerate(la_scores.columns):
    mloc = Moran_Local(la[col].values, w)
    lisa_cluster(mloc, la, ax=ax[i])
    ctx.add_basemap(ax[i], source=ctx.providers.CartoDB.Positron, crs=la.crs)
    ax[i].set_title(col.replace("_", " ").title(), fontsize=16)
plt.tight_layout()

Code
# miloc_la.explore(la, tiles='CartoDB Positron')

These results show nuance compared to Anderson & Egeland (1961); specifically, the results are similar in Chicago, but differ in LA. ‘Social rank’ follows a concentric pattern in LA whereas urbanization is sectoral; the converse is true in Chicago, where urbanization is monocentric but social rank is polycentric.

Despite its many criticisms, a useful contribution of factorial ecology is its ability to summarize the features that dominate the partitioning of population groups in space. While the resulting factors sre imperfect, they do allow us to understand whether the axes of differentiation are changing over time or across places (Berry & Rees, 1969).

“Social status” still seems to be the dominant mode by which American cities are organized. The social component explains the largest share of covariance, by far, and it has the strongest spatial signal in both Chicago and LA (demonstrated by an enormously high Moran’s I). In that light, recent work showing a modest decline in racial segregation albeit an increase in income segregation is probably just a continuation of a decades long pattern (Bischoff & Owens, 2019; Bischoff & Reardon, 2013; Intrator et al., 2016; Logan et al., 2018; Reardon et al., 2018; Reardon & Bischoff, 2011)

21.3 Recovering Social Areas

Technically, the last step of SAA is to typologize the observations according to the revealed factors. The original SAA typologies were simple (only three variables in their case, after all), but why not throw them at a clustering algorithm

Code
from geosnap.analyze import cluster

la_types = cluster(
    gdf=la.reset_index(),
    columns=la_scores.columns.tolist(),
    method="kmeans",
    n_clusters=4,
)
Code
chi_types = cluster(
    gdf=chi.reset_index(),
    columns=chi_scores.columns.tolist(),
    method="kmeans",
    n_clusters=4
)
Code
la_types[["kmeans", "geometry"]].plot(
    "kmeans", categorical=True, cmap="Accent_r", 
)

Code
chi_types[["kmeans", "geometry"]].plot(
    "kmeans", categorical=True, cmap="Accent_r", 
)


  1. Lew is named after urban planning theorist Lew Hopkins↩︎