Combining the scholarship on spatial structure and segregation, urban scholars have long recognized the value of studying multi-dimensional segregation, i.e. how cities partition into smaller communities along the lines of race, ethnicity, socioeconomic status, family structure, etc. Work in this tradition develops and applies methods to identify patterns like Burgess’s concentric zones model, and asks (a) which “dimensions” are most important for understanding residential sorting, and (b) whether cities in different regions, countries, or political/economic systems follow similar patterns.
Before the Geography of Opportunity (Galster & Killen, 1995), there was the Ecology of Inequality (Massey & Eggers, 1990), and prior to that, a massive literature on ‘factor ecology’ which adopts exploratory factor analysis to analyze residential differentiation. An important vestige of this tradition is the focus on neighborhood differentiation as an outcome of social processes. Thus, the goal is to collect neighborhood data and conduct factor analysis to understand which neighborhood variables seem to measure the same “underlying construct”. That is, while the ultimate result is a dimensionality-reduction technique, the ‘factors’ uncovered by the researchers are viewed as [partial] measurements of different segregating processes.
In this section we explore comparative factor ecology (and extend it) using the canonical examples of Los Angeles and Chicago. These two cities (metropolitan regions, actually) serve as a useful comparison because they are two large and well known cities, but also because the conceptual research design and empirical examples were developed in these places (Shevky & Williams, 1949). Further, some have argued that L.A. and Chicago represent two distinct traditions of urban research, both focused on community scholarship albeit with different lineage (Dear, 2002).
21.1 Social Area Analysis and Factorial Ecology
Much attention is now being given to the construction of models of urban structure of a type which are statistical and computerised and which are contributing the first wave of generalised urban models since those produced by the American school of human ecology in the 1920’s and 1930’s. This new interest in model-making is however, paralleled by a series of attempts at formulating new methods of analysis for the study of urban structure. These again have their analogies in the interwar Chicago school, where the production of the general ecological models was preceded by extensive social investigation and the mapping of social data.
The original idea concept behind social area analysis and factorial ecology is to summarize urban data along its primary axes, then classify areas according to these axes. Although both SAA and FE drew considerable criticism for being atheoretical, the fomalization of the method was intended directly to address several hypotheses about social and spati al structure (Arsdol et al., 1958; Bell, 1955; Bell & Greer, 1962; Schmid et al., 1958; Van Arsdol et al., 1961, 1962). This predates the inception of confirmatory factor analysis, so the hypothesis testing was less stringent, but the hypotheses were explicit nonetheless.
The first testable hypothesis is that American cities divide themselves along three principal axes related to economic status, family status, and ethnic status, which together provide the foundation for location choice and multidimensional segregation (Bell, 1955). The second set of hypotheses focus on the relationship between the revealed dimensions and social behaviors for populations in different areas. This is an early forerunner to neighborhood effects research (Green, 1971; Greer, 1960; Johnston et al., 2004).
“Shevky, Williams, and Bell [301, 3022] argued that most of the social differentiation and stratification of the population in the United States can be summarized in three primary social”dimensions”:
an index of social rank measuring socioeconomic status,
an index of urbanization measuring family status,
and an index of segregation measuring ethnic status”
And as decribed above, these measures of social differentiation were viewed as outcomes of unobservable social processes (i.e. segregation by age and family size). Following, scholars used these variables to test other hypotheses, such as whether having a larger family resulted in different community-level behaviors like voting turnout or civic participation.
“The dimensions are social rank, segregation, and urbanization. The last largely measures differences in family structure, and, it is assumed, indicates corollary differences in behavior. Thus, when social rank and segregation are controlled, differences in the index of urbanization for specific tract populations should indicate consistent variations in social behavior. One purpose of the present research was to determine the nature of such corollary differences, and particularly differences in social participation.” (Greer, 1956, p. 19)
Factorial ecology and social area analysis endured a great deal of criticism before being essentially abandoned by the 1990s, however these two hypotheses–especially the first–are probably among the most replicated findings in empirical urban research.
“accepting the systemic assumption, factorial ecology asks the question”how does the system cohere and pattern?” The answer is sought by trying to identify repetitive sequences of spatial variation present in many observable attributes of area”
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:217: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:275: UserWarning: Unable to find local adjustment year for 2021. Attempting from online data
warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:217: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
warn(
Code
sns.clustermap( chi_corr, cmap="RdBu_r", annot=True, fmt=".2f", figsize=(10, 10), annot_kws={"size": 6},)plt.suptitle("Correlation Structure in Chicago Region", fontsize=20)# plt.tight_layout()
Text(0.5, 0.98, 'Correlation Structure in Chicago Region')
Code
sns.clustermap( la_corr, cmap="RdBu_r", annot=True, fmt=".2f", figsize=(10,10), annot_kws={"size": 6},)plt.suptitle("Correlation Structure in LA Region", fontsize=20)
Text(0.5, 0.98, 'Correlation Structure in LA Region')
These tell different stories. Whereas racial segregation is more important in Chicago, ethnic segregation is more obvious in LA
Code
# collinearla_corr = la_corr.drop(columns=["p_asian_indian_persons", "p_vacant_housing_units"])chi_corr = chi_corr.drop(columns=["p_asian_indian_persons", "p_vacant_housing_units"])cols = chi_corr.columns# create and fit factor analysis on z-standardized data# using as many factors as there are variablesfa_la = FactorAnalyzer(rotation="oblimin", n_factors=la_corr.shape[1])fa_chi = FactorAnalyzer(rotation="oblimin", n_factors=chi_corr.shape[1])fa_la.fit(la[cols].apply(zscore))fa_chi.fit(chi[cols].apply(zscore))# collect the factor measures and store then as pandas Seriesevla, _ = fa_la.get_eigenvalues()evchi, _ = fa_chi.get_eigenvalues()evla = pd.Series(evla)evchi = pd.Series(evchi)# scree plot for each regionf, ax = plt.subplots(1, 2, figsize=(9, 4))evla.iloc[:10].plot(grid=True, style=".-", ax=ax[0])ax[0].set_title("LA")evchi.iloc[:10].plot(grid=True, style=".-", ax=ax[1])ax[1].set_title("Chicago")
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
warnings.warn(
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
warnings.warn(
Text(0.5, 1.0, 'Chicago')
Clear elbow at 4 in LA, but 5 in Chicago. Since the original work focuses on 3 factors, we will fit 4 here in both cases
Code
# re-fit the four factor solutionfala = FactorAnalyzer(n_factors=4)fala.fit(la[cols].apply(zscore).fillna(0))fachi = FactorAnalyzer(n_factors=4)fachi.fit(chi[cols].apply(zscore))# create a dataframe of the factor loadings for each region factors_la = pd.DataFrame.from_records( fala.loadings_, index=la_corr.columns, columns=["F1", "F2", "F3", "F4"])factors_chi = pd.DataFrame.from_records( fachi.loadings_, index=cols, columns=["F1", "F2", "F3", "F4"])
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
warnings.warn(
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
warnings.warn(
loadings less than .1 are considered unimportant (R and others suppress them). Revelle says ignore less than .3.
We will depart a bit from the standard conventions and fit a model with 4 factors rather than 3
Gla = nx.from_pandas_edgelist( factors_la.T.stack().rename("weight").reset_index().round(3), source="level_0", target="level_1", edge_attr="weight", edge_key="weight", create_using=nx.DiGraph,)f, ax = plt.subplots(figsize=(8, 11))pos = graphviz_layout(Gla, prog="dot", args='-Grankdir="LR"')nx.draw_networkx( Gla, pos=pos, with_labels=True, ax=ax, edge_cmap=plt.cm.RdBu_r, edge_color=factors_la.T.stack().values, node_size=500, width=3, arrowsize=14,)labels = nx.get_edge_attributes(Gla, "weight")nx.draw_networkx_edge_labels(Gla, pos, edge_labels=labels)ax.margins(0.2, None) # add some horizontal space to fit labelsax.axis("off")plt.suptitle("Factor Loadings in LA", fontsize=18)plt.tight_layout()
Code
pd.DataFrame( fala.phi_, columns=[f"F{i+1}"for i inrange(4)], index=[f"F{i+1}"for i inrange(4)],)
F1
F2
F3
F4
F1
1.000000
0.260805
0.380428
0.074312
F2
0.260805
1.000000
-0.036529
-0.003217
F3
0.380428
-0.036529
1.000000
0.084671
F4
0.074312
-0.003217
0.084671
1.000000
Code
factors_chi.dropna(how="all")
F1
F2
F3
F4
median_home_value
0.719764
NaN
NaN
NaN
median_contract_rent
0.505525
NaN
NaN
NaN
median_household_income
0.695099
NaN
NaN
NaN
per_capita_income
0.850860
NaN
NaN
NaN
p_owner_occupied_units
NaN
NaN
0.440959
NaN
p_housing_units_multiunit_structures
NaN
NaN
NaN
0.840963
p_persons_under_18
NaN
-0.575334
NaN
-0.352355
p_persons_over_60
NaN
0.886817
NaN
NaN
p_persons_over_75
NaN
0.791416
NaN
NaN
p_married
NaN
NaN
0.563778
-0.478257
p_widowed_divorced
NaN
0.495951
NaN
NaN
p_female_headed_families
NaN
NaN
-0.468703
NaN
p_nonhisp_white_persons
0.522263
NaN
0.327412
NaN
p_nonhisp_black_persons
NaN
NaN
-0.890260
NaN
p_hispanic_persons
-0.732092
NaN
0.608918
NaN
p_edu_hs_less
-0.732998
NaN
NaN
NaN
p_edu_college_greater
0.958498
NaN
NaN
NaN
p_veterans
NaN
0.407699
NaN
NaN
pop_density
NaN
NaN
NaN
0.630527
Code
Gchi = nx.from_pandas_edgelist( factors_chi.T.stack().rename("weight").reset_index().round(3), source="level_0", target="level_1", edge_attr="weight", edge_key="weight", create_using=nx.DiGraph,)f, ax = plt.subplots(figsize=(8, 11))pos = graphviz_layout(Gchi, prog="dot", args='-Grankdir="LR"')nx.draw_networkx( Gchi, pos=pos, with_labels=True, ax=ax, edge_cmap=plt.cm.RdBu_r, edge_color=factors_chi.T.stack().values, node_size=500, width=3, arrowsize=14,)labels = nx.get_edge_attributes(Gchi, "weight")nx.draw_networkx_edge_labels(Gchi, pos, edge_labels=labels)ax.margins(0.15, None) # add some horizontal space to fit labelsax.axis("off")plt.suptitle("Factor Loadings in Chicago", fontsize=18)plt.tight_layout()
Code
pd.DataFrame( fachi.phi_, columns=[f"F{i+1}"for i inrange(4)], index=[f"F{i+1}"for i inrange(4)],)
F1
F2
F3
F4
F1
1.000000
0.122424
-0.172076
0.320474
F2
0.122424
1.000000
-0.215499
-0.260023
F3
-0.172076
-0.215499
1.000000
0.006572
F4
0.320474
-0.260023
0.006572
1.000000
This is not exactly the factor structure postulated by Shevky and Bell, but it is very similar to the large body of replication work that followed in the 60s and 70s. The factors are ordered slightly differently between the two regions, but follow the same general structure
the ‘social rank’ factor is dominated by income, education, and land value (rent/home value) to an extent.
the ‘family structure’ factor is dominated by variables related to age and marital status
the ‘urbanization’ factor seems to capture density and morphology
‘segregation factor’ captures race in both cases. Interestingly, it loads exclusively on Asian population in LA
That last point is really important. In Los Angeles, racial inequality is so deeply intertwined with socioeconomic inequality (at the neighborhood level) that they cant be separated into distinct factors. More bluntly, when you are talking about a predominantly Black and or brown neighborhood, you are almost inevitably talking about a predominantly poor neighborhood. When we think about segregation and neighborhood effects, the idea that race and inequality are essentially synonymous in LA is sobering, to say the least. If we want to make a dent in inequality, then it means intentionally shooting for more integration… But the Chicago result, showing the converse, is also important. There, Black and Hispanic segregation load on a different factor than socioeconomic status, which is striking in a different way. While the separation of these factors means that there are a good number of high-SES minority neighborhoods (so ‘SES’ and ‘[racial] segregation’ measure distinct concepts), it also means that you cannot understand Chicago’s social geography without considering black and hispanic segregation (the race factor is more important in Chicago than LA, in that it explains a greater share of the covariance).
Put differently, the Black/Hispanic composition of the neighborhood is one of the most salient factors guiding location choice in Chicago. In LA, any preference for white neighborhoods is masked by a preference by high SES neighborhoods
These results are very similar to Anderson & Bean (1961), who find that the original Shevky/Bell urbanization factor is better split into two concepts, one demographic and one morphological: “Factor A is almost equivalent to the percent of dwellings in multiunit structures (loading .971)” (Anderson & Bean, 1961). As a generic categorization, these factors map fairly well onto the original SAA factors (as long as you have them in mind), though obviously the loading structure can be quite different across cities, even if the general factors are similar. For example what defines the segregation dimension in LA is the Asian population, whereas in Chicago the factor is defined by its share of Black and Hispanic/Latino residents. Put differently, if you were trying to define the most important social dimensions to define American cities, then race and ethnicity would certainly comprise one of the dimensions, but the particular makeup of the factor depend on the history and demography of the city under study
Unike the dimensions of segregation explored in the segregation chapter, these latent variables are more useful than their constituent parts. Thus we can use the factor model to estimate the latent variables for each observation to map or analyze them further
Code
chi_scores = pd.DataFrame( fachi.transform(chi[cols].dropna(subset=cols).apply(zscore).values), columns=list(chi_map.values()),)la_scores = pd.DataFrame( fala.transform(la[cols].dropna(subset=cols).apply(zscore).values), columns=list(la_map.values()),)for col in chi_scores.columns: chi[col] = chi_scores[col].valuesfor col in la_scores.columns: la[col] = la_scores[col].values
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
warnings.warn(
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/sklearn/utils/deprecation.py:151: FutureWarning: 'force_all_finite' was renamed to 'ensure_all_finite' in 1.6 and will be removed in 1.8.
warnings.warn(
Wow, the segregation dimension is captured far better than i would have imagined from the loadings alone. It makes more sense if you reverse the score’s sign in this case.
For the sake of illustration, I will plot the SES factor along with the location of the apartment I lived in when I was a visiting assistant professor at UIC–and just for context, we’re talking 2017; I was 29 years old, my salary was $35k, and my student loans were in repayment (after accumulating three degrees’ worth of compound interest at Great Recession rates…)
Code
m = chi[["socioeconomic status", "geometry"]].assign(geometry=chi.geometry.simplify(100)).explore("socioeconomic status", scheme="quantiles", cmap="RdBu_r", style_kwds=dict(fill_opacity=0.4, weight=0.3), tiles="CartoDB Positron",)# this is my old place in chicagogpd.tools.geocode("3658 armitage ave, chicago, il").explore( color="black", marker_kwds=dict(radius=12), style_kwds=dict(fill_opacity=1), m=m,)
Make this Notebook Trusted to load map: File -> Trust Notebook
By just about any estimate, this map makes me a gentrifier. While my income was probably at or below the median for the neighborhood, I was younger and more educated than most of my neighbors, and a white guy living in a predominantly minority neighborhood. If you are not familiar with Chicago’s social geography, this is near the aready-gentrified Bucktown, Wicker Park, and Logan Square neighborhoods, all of which have more shopping and nightlife amenities (and closer to the L), but none of which I could afford.
When I would describe my neighborhood to colleagues, they would almost inevitably describe that region of the city as ‘block to block’ in terms of ‘desireability’. In uncoded terms, that part of the city is widely perceived as the gentrifying frontier, and if you zoom into the circle, it is easy to understand why: it sits at the gateway of high-high and low-low SES local clusters, and in a few hundred meters, incomes change very fast. This brings a lot of diverse groups into conact with one another in a small space. Some spaces get defended (Kadowaki, 2019; Suttles, 1972), others get invaded (London, 1980; Park et al., 1925), but this is the region where the dividing lines are being drawn actively.
Parenthetically, the diverse set of residents did seem strongly united in their political views and party affiliation
(Obviously, I didn’t deface the sidewalk myself, but I couldn’t resist the opportunity to document the local context in my new neighborhood at the time). Political graffiti, especially on publicly-owned property, is a clear indicator of a specific form of collective efficacy (Alvarado, 2016; Carbone & McMillin, 2019; Cohen et al., 2008; Feinberg & Sturm, 2019; Hipp, 2016; Sampson et al., 1999). I left the location metadata in the image for enterprising readers desperate to know where the photo was taken (though it’s only a few blocks from the geocoded point above).
Make this Notebook Trusted to load map: File -> Trust Notebook
“Another charge is that all of our comparative studies are guilty of ethnocentrism and ideological bias, particularly in studies that presume the material and structural superiority of the complex industrial structures of industrial societies in general and Western democracies in particular. The presumption can be overtly stated, or implied by the transfer of conceptual and methodological schemes. Undoubtedly, this charge has much truth to it. Probably 90 percent of all cross-national research is initiated in the United States and has its dimensions of comparison defined in U. S. terms”
“to quote the author of an important recent textbook,”by far the major finding is that residential differentiation in the great majority of cities is dominated by a socio-economic dimension, with a second dimension characterised by family status/life cycle characteristics and a third dimension relating to segregation along ethnic divisions” (Knox1981,p.81). This is a common, but rather misleading, conclusion. Socio-economic status is generally the pre-eminent factor, which frequently accounts for over a third of the total variance. However, this reflects the spatial congruence of an important group of structural parameters - occupation, income, number of years schooling, employment status among them - rather than the degree to which any one parameter which is indicative of socio-economic status is correlated with area of residence. Other important structural parameters, such as racial and ethnic status, do not co-vary spatially and are consequently represented by separate, relatively unimportant factors (for example, Rees 1970)
This is almost exactly what the results still show using 2021 ACS Data.
The conceptual argument for factorial ecology is that residential areas can be characterized by lots of different data points–hundreds of Census indicators if so desired–but after considering all these measures, the differentiation between neighborhoods can be characterized, almost entirely, by a small handful of representative dimensions. Destite some arguments over interpretation and application, dozens of replication studies agree with this basic premise, and as the scree plots above show, four or so factors capture nearly all the covariation in the blockgroup attributes. The empirical findings of factorial ecology are more or less uncontested.
The major critique of factor ecology is its inescapable rooting in human ecology, which for all of its contributions, views urban dynamics as ‘natural’ elements akin to biology. Obviously our understanding of structural inqeuality today regards that view as flippant, and that many of that spatial patterns we observe in cities today are the direct result of institutionalized racism and intentionally-designed public policies.
The kernel of genius apparent in factorial ecology, and our contemporary understanding of structural inequality are perfectly compatible, as long as we reorient the interpretation of the latent variables to represent “sorting factors” guided by political, economic and social forces, rather than outcomes from some natural ecological (or utility maximizing) process (Quillian, 2015). Class, race, age, and the characteristics of the built environment remain the primary ways that cities are organized, through a mixture of individual choices, market forces, and public policies.
21.2 Spatial Structure in Social Dimensions
In another classic example, Anderson & Egeland (1961) argue that the factorial dimensions identified above should demonstrate different spatial layouts.
“the results indicate clearly that Burgess’ concentric zone hypothesis is essentially supported with respect to urbanization but not with respect to social rank (or prestige value as this dimension is termed in this paper), while Hoyt’s sector hypothesis is supported with respect to social rank (prestige value) but not with respect to urbanization”
An important difference between today’s urban analytics and yesterday’s factor ecology is the ability to conduct formal tests of spatial structure. Techniques like the LISA were not developed until the 90s, by which time factor ecology had been all but abandoned. This yields a unique opportunity to examine spatial strucure in social structure. That is, do we find a strong spatial signal in the different social dimensions?
and is the signal the same across dimensions?
across places?
over time?
A compelling argument is given by Savitz & Raudenbush (2009), that spatial effects can be used to develop better estimates of the latent factors, but I am not aware of anyone who has examined spatial dependence in the resulting latent dimensions.
Code
w_chi = Rook.from_dataframe(chi)ds = []for col in chi_scores.columns: m = Moran(chi[col].values, w_chi) d = pd.Series({"I": m.I, "p-val": m.p_sim}, name=col) ds.append(d)pd.DataFrame(ds)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_15892/4099440288.py:1: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
w_chi = Rook.from_dataframe(chi)
I
p-val
socioeconomic status
0.778226
0.001
family structure
0.261456
0.001
segregation
0.740742
0.001
urbanization
0.750752
0.001
In Chicago, all the dimensions show significant spatial autocorrelation, with SES, urbanization, and segregation all having Moran’s I values greater than 0.7–which is remarkably high. This is a unique finding… the exploratory factor analysis suggests a 4 factor fit (which loosely comports with the general factor ecology results), and the obliquely-rotated factors are essentially independent–but all have very strong spatial patterning. That suggests four social dimensions with four spatial signatures…
Chicago’s regional geography is much more interesting than its strictly-municipal georaphy, if you ask me–especially when you consider the traditional story of spatial structure in these places… Chicago is the definition of the monocentric city, but when we look for clusters of its socioeconomic status, a much larger polycentric structure emerges (especially the familiar north-south divide).
But the urbanization factor essentially recovers the rent gradient! (wow!). There are the concentric rings.
The family structure variable in Chicago is basically a map of where the young people live… Low-Low clusters are the young places–hip and trendy, but not very career-established, so not particularly high-income. The really hip and trendy places here ad the “High-Low” observations like wicker park and mount pleasant, where older established families (who can afford the relatively expensive places) live in the core of the cool neighborhood.
And the segregation variable is a similarly familiar depiction of the Black and Hispanic neighborhoods in the region, whose hypersegregation (especially in Chicago) has been long studied.
This is the appeal of factor ecology… intuitively, these factors ‘make sense’ as principle axes of the region’s social geography.
Make this Notebook Trusted to load map: File -> Trust Notebook
This is not a picture of monocentrism:
And look at the distinct transition between zones moving westward from the lake along I-290!
By contrast, the infamously polycentric Los Angeles turns into an egg yolk, with nearly concentric rings telling a story about city vs suburbs
Code
w = Rook.from_dataframe(la)ds = []for col in la_scores.columns: m = Moran(la[col].values, w) d = pd.Series({"I": m.I, "p-val": m.p_sim}, name=col) ds.append(d)pd.DataFrame(ds)
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_15892/2765111550.py:1: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
w = Rook.from_dataframe(la)
/Users/knaaptime/miniforge3/envs/urban_analysis/lib/python3.12/site-packages/libpysal/weights/contiguity.py:61: UserWarning: The weights matrix is not fully connected:
There are 4 disconnected components.
There is 1 island with id: 5441.
W.__init__(self, neighbors, ids=ids, **kw)
('WARNING: ', 5441, ' is an island (no neighbors)')
I
p-val
socioeconomic status
0.844877
0.001
urbanization
0.550889
0.001
family structure
0.289145
0.001
segregation
0.439830
0.001
In LA, all the dimensions are significantly spatially patterned as well, though with a bit more nuance than Chicago. The Moran value for SES is dramatically high at .84
These results show nuance compared to Anderson & Egeland (1961); specifically, the results are similar in Chicago, but differ in LA. ‘Social rank’ follows a concentric pattern in LA whereas urbanization is sectoral; the converse is true in Chicago, where urbanization is monocentric but social rank is polycentric.
Despite its many criticisms, a useful contribution of factorial ecology is its ability to summarize the features that dominate the partitioning of population groups in space. While the resulting factors sre imperfect, they do allow us to understand whether the axes of differentiation are changing over time or across places (Berry & Rees, 1969).
“Social status” still seems to be the dominant mode by which American cities are organized. The social component explains the largest share of covariance, by far, and it has the strongest spatial signal in both Chicago and LA (demonstrated by an enormously high Moran’s I). In that light, recent work showing a modest decline in racial segregation albeit an increase in income segregation is probably just a continuation of a decades long pattern (Bischoff & Owens, 2019; Bischoff & Reardon, 2013; Intrator et al., 2016; Logan et al., 2018; Reardon et al., 2018; Reardon & Bischoff, 2011)
21.3 Recovering Social Areas
Technically, the last step of SAA is to typologize the observations according to the revealed factors. The original SAA typologies were simple (only three variables in their case, after all), but why not throw them at a clustering algorithm
Code
from geosnap.analyze import clusterla_types = cluster( gdf=la.reset_index(), columns=la_scores.columns.tolist(), method="kmeans", n_clusters=4,)
Alvarado, S. E. (2016). Neighborhood disadvantage and obesity across childhood and adolescence: Evidence from the NLSY children and young adults cohort (1986-2010). Social Science Research, 57, 80–98. https://doi.org/10.1016/j.ssresearch.2016.01.008
Anderson, T. R., & Bean, L. L. (1961). The Shevky-Bell Social Areas: Confirmation of Results and a Reinterpretation. Social Forces, 40(2), 119–124. https://doi.org/10.2307/2574289
Anderson, T. R., & Egeland, J. A. (1961). Spatial Aspects of Social Area Analysis. American Sociological Review, 26(3), 392. https://doi.org/10.2307/2090666
Arsdol, M. D., Camilleri, S. F., & Schmid, C. F. (1958). An Application of the Shevky Social Area Indexes to a Model of Urban Society. Social Forces, 37(1), 26–32. https://doi.org/10.2307/2573775
Bell, W. (1955). Economic, Family, and Ethnic Status: An Empirical Test. American Sociological Review, 20(1), 45–52. https://doi.org/10.2307/2088199
Bell, W., & Greer, S. (1962). Social Area Analysis and Its Critics. The Pacific Sociological Review, 5(1), 3–9. https://doi.org/10.2307/1388270
Berry, B. J. L. (1971). Introduction: The Logic and Limitations of Comparative Factorial Ecology. Economic Geography, 47(4), 209. https://doi.org/10.2307/143204
Berry, B. J. L., & Rees, P. H. (1969). The Factorial Ecology of Calcutta. American Journal of Sociology, 74(5), 445–491. https://doi.org/10.1086/224681
Bischoff, K., & Owens, A. (2019). The Segregation of Opportunity: Social and Financial Resources in the Educational Contexts of Lower- and Higher-Income Children, 1990–2014. Demography, 1990–2014. https://doi.org/10.1007/s13524-019-00817-y
Bischoff, K., & Reardon. (2013). Residential Segregation by Income, 1970-2009. The Lost Decade? Social Change in the U.S. After 2000, 44.
Carbone, J. T., & McMillin, S. E. (2019). Reconsidering Collective Efficacy: The Roles of Perceptions of Community and Strong Social Ties. City & Community, 1–18. https://doi.org/10.1111/cico.12413
Galster, G. C., & Killen, S. P. (1995). The geography of metropolitan opportunity: A reconnaissance and conceptual framework. Housing Policy Debate, 6(1), 7–43. https://doi.org/10.1080/10511482.1995.9521180
Greer, S. (1956). Urbanism Reconsidered: A Comparative Study of Local Areas in a Metropolis. American Sociological Review, 21(1), 19. https://doi.org/10.2307/2089335
Greer, S. (1960). The Social Structure and Political Process of Suburbia. American Sociological Review, 25(4), 514–526. https://doi.org/10.2307/2092936
Hipp, J. R. (2016). Collective efficacy: How is it conceptualized, how is it measured, and does it really matter for understanding perceived neighborhood crime and disorder? Journal of Criminal Justice, 46(46), 32–44. https://doi.org/10.1016/j.jcrimjus.2016.02.016
Intrator, J., Tannen, J., & Massey, D. S. (2016). Segregation by race and income in the United States 1970–2010. Social Science Research, 60, 45–60. https://doi.org/10.1016/j.ssresearch.2016.08.003
Johnston, R., Jones, K., Burgess, S. M., Propper, C., Sarker, R., & Bolster, A. (2004). Scale, Factor Analyses, and Neighborhood Effects. Geographical Analysis, 36(4), 350–368. https://doi.org/10.1353/geo.2004.0016
Kadowaki, J. (2019). The Contemporary Defended Neighborhood: Maintaining Stability and Diversity through Processes of Community Defense. City & Community, 18(4), 1220–1239. https://doi.org/10.1111/cico.12471
Logan, J. R., Foster, A., Ke, J., & Li, F. (2018). The uptick in income segregation: Real trend or random sampling variation? American Journal of Sociology, 124(1), 185–222. https://doi.org/10.1086/697528
London, B. (1980). Gentrification as Urban Reinvasion: Some Preliminary Definitional and Theoretical Considerations. In Back to the city: Issues in neighborhood renovation.
Massey, D. S., & Eggers, M. L. (1990). The Ecology of Inequality: Minorities and the Concentration of Poverty, 1970-1980. American Journal of Sociology, 95(5), 1153–1188. https://doi.org/10.1086/229425
Quillian, L. (2015). A Comparison of Traditional and Discrete-Choice Approaches to the Analysis of Residential Mobility and Locational Attainment. The ANNALS of the American Academy of Political and Social Science, 660(1), 240–260. https://doi.org/10.1177/0002716215577770
Reardon, S. F., & Bischoff, K. (2011). Income Inequality and Income Segregation. American Journal of Sociology, 116(4), 1092–1153. https://doi.org/10.1086/657114
Reardon, S. F., Bischoff, K., Owens, A., & Townsend, J. B. (2018). Has Income Segregation Really Increased? Bias and Bias Correction in Sample-Based Segregation Estimates. Demography. https://doi.org/10.1007/s13524-018-0721-4
Rees, P. H. (1971). Factorial Ecology: An Extended Definition, Survey, and Critique of the Field. Economic Geography, 47(4), 220. https://doi.org/10.2307/143205
Salins, P. D. (1971). Household Location Patterns in American Metropolitan Areas. Economic Geography, 47(1), 234. https://doi.org/10.2307/143206
Sampson, R. J., Morenoff, J. D., & Earls, F. (1999). Beyond Social Capital: Spatial Dynamics of Collective Efficacy for Children. American Sociological Review, 64(5), 633–660. https://doi.org/10.2307/2657367
Savitz, N. V., & Raudenbush, S. W. (2009). Exploiting Spatial Dependence to Improve Measurement of Neighborhood Social Processes. Sociological Methodology, 39(1), 151–183. https://doi.org/10.1111/j.1467-9531.2009.01221.x
Schmid, C. F., MacCannell, E. H., & van Arsdol Jr., M. D. (1958). The Ecology of the American City: Further Comparison and Validation of Generalizations. American Sociological Review, 23(4), 392–401. https://doi.org/10.2307/2088802
Shevky, E., & Williams, M. (1949). The social areas of Los Angeles: Analysis and typology. Published for the John Randolph Haynes and Dora Haynes Foundation by the University of California Press. https://books.google.com/books?id=nrUPAQAAMAAJ
Van Arsdol, M. D., Camilleri, S. F., & Schmid, C. F. (1961). An Investigation of the Utility of Urban Typology. The Pacific Sociological Review, 4(1), 26–32. https://doi.org/10.2307/1388484
Van Arsdol, M. D., Camilleri, S. F., & Schmid, C. F. (1962). Further Comments on the Utility of Urban Typology. The Pacific Sociological Review, 5(1), 9–13. https://doi.org/10.2307/1388271
Lew is named after urban planning theorist Lew Hopkins↩︎
21.1 Social Area Analysis and Factorial Ecology
The original idea concept behind social area analysis and factorial ecology is to summarize urban data along its primary axes, then classify areas according to these axes. Although both SAA and FE drew considerable criticism for being atheoretical, the fomalization of the method was intended directly to address several hypotheses about social and spati al structure (Arsdol et al., 1958; Bell, 1955; Bell & Greer, 1962; Schmid et al., 1958; Van Arsdol et al., 1961, 1962). This predates the inception of confirmatory factor analysis, so the hypothesis testing was less stringent, but the hypotheses were explicit nonetheless.
The first testable hypothesis is that American cities divide themselves along three principal axes related to economic status, family status, and ethnic status, which together provide the foundation for location choice and multidimensional segregation (Bell, 1955). The second set of hypotheses focus on the relationship between the revealed dimensions and social behaviors for populations in different areas. This is an early forerunner to neighborhood effects research (Green, 1971; Greer, 1960; Johnston et al., 2004).
And as decribed above, these measures of social differentiation were viewed as outcomes of unobservable social processes (i.e. segregation by age and family size). Following, scholars used these variables to test other hypotheses, such as whether having a larger family resulted in different community-level behaviors like voting turnout or civic participation.
Factorial ecology and social area analysis endured a great deal of criticism before being essentially abandoned by the 1990s, however these two hypotheses–especially the first–are probably among the most replicated findings in empirical urban research.
for an overview of the method, see Rees (1971)
Code
Code
Code
These tell different stories. Whereas racial segregation is more important in Chicago, ethnic segregation is more obvious in LA
Code
Clear elbow at 4 in LA, but 5 in Chicago. Since the original work focuses on 3 factors, we will fit 4 here in both cases
Code
loadings less than .1 are considered unimportant (R and others suppress them). Revelle says ignore less than .3.
We will depart a bit from the standard conventions and fit a model with 4 factors rather than 3
Code
Code
Code
Code
Code
Code
This is not exactly the factor structure postulated by Shevky and Bell, but it is very similar to the large body of replication work that followed in the 60s and 70s. The factors are ordered slightly differently between the two regions, but follow the same general structure
the ‘social rank’ factor is dominated by income, education, and land value (rent/home value) to an extent.
the ‘family structure’ factor is dominated by variables related to age and marital status
the ‘urbanization’ factor seems to capture density and morphology
‘segregation factor’ captures race in both cases. Interestingly, it loads exclusively on Asian population in LA
That last point is really important. In Los Angeles, racial inequality is so deeply intertwined with socioeconomic inequality (at the neighborhood level) that they cant be separated into distinct factors. More bluntly, when you are talking about a predominantly Black and or brown neighborhood, you are almost inevitably talking about a predominantly poor neighborhood. When we think about segregation and neighborhood effects, the idea that race and inequality are essentially synonymous in LA is sobering, to say the least. If we want to make a dent in inequality, then it means intentionally shooting for more integration… But the Chicago result, showing the converse, is also important. There, Black and Hispanic segregation load on a different factor than socioeconomic status, which is striking in a different way. While the separation of these factors means that there are a good number of high-SES minority neighborhoods (so ‘SES’ and ‘[racial] segregation’ measure distinct concepts), it also means that you cannot understand Chicago’s social geography without considering black and hispanic segregation (the race factor is more important in Chicago than LA, in that it explains a greater share of the covariance).
Put differently, the Black/Hispanic composition of the neighborhood is one of the most salient factors guiding location choice in Chicago. In LA, any preference for white neighborhoods is masked by a preference by high SES neighborhoods
These results are very similar to Anderson & Bean (1961), who find that the original Shevky/Bell urbanization factor is better split into two concepts, one demographic and one morphological: “Factor A is almost equivalent to the percent of dwellings in multiunit structures (loading .971)” (Anderson & Bean, 1961). As a generic categorization, these factors map fairly well onto the original SAA factors (as long as you have them in mind), though obviously the loading structure can be quite different across cities, even if the general factors are similar. For example what defines the segregation dimension in LA is the Asian population, whereas in Chicago the factor is defined by its share of Black and Hispanic/Latino residents. Put differently, if you were trying to define the most important social dimensions to define American cities, then race and ethnicity would certainly comprise one of the dimensions, but the particular makeup of the factor depend on the history and demography of the city under study
Code
Unike the dimensions of segregation explored in the segregation chapter, these latent variables are more useful than their constituent parts. Thus we can use the factor model to estimate the latent variables for each observation to map or analyze them further
Code
Code
Wow, the segregation dimension is captured far better than i would have imagined from the loadings alone. It makes more sense if you reverse the score’s sign in this case.
For the sake of illustration, I will plot the SES factor along with the location of the apartment I lived in when I was a visiting assistant professor at UIC–and just for context, we’re talking 2017; I was 29 years old, my salary was $35k, and my student loans were in repayment (after accumulating three degrees’ worth of compound interest at Great Recession rates…)
Code
By just about any estimate, this map makes me a gentrifier. While my income was probably at or below the median for the neighborhood, I was younger and more educated than most of my neighbors, and a white guy living in a predominantly minority neighborhood. If you are not familiar with Chicago’s social geography, this is near the aready-gentrified Bucktown, Wicker Park, and Logan Square neighborhoods, all of which have more shopping and nightlife amenities (and closer to the L), but none of which I could afford.
When I would describe my neighborhood to colleagues, they would almost inevitably describe that region of the city as ‘block to block’ in terms of ‘desireability’. In uncoded terms, that part of the city is widely perceived as the gentrifying frontier, and if you zoom into the circle, it is easy to understand why: it sits at the gateway of high-high and low-low SES local clusters, and in a few hundred meters, incomes change very fast. This brings a lot of diverse groups into conact with one another in a small space. Some spaces get defended (Kadowaki, 2019; Suttles, 1972), others get invaded (London, 1980; Park et al., 1925), but this is the region where the dividing lines are being drawn actively.
Parenthetically, the diverse set of residents did seem strongly united in their political views and party affiliation
(Obviously, I didn’t deface the sidewalk myself, but I couldn’t resist the opportunity to document the local context in my new neighborhood at the time). Political graffiti, especially on publicly-owned property, is a clear indicator of a specific form of collective efficacy (Alvarado, 2016; Carbone & McMillin, 2019; Cohen et al., 2008; Feinberg & Sturm, 2019; Hipp, 2016; Sampson et al., 1999). I left the location metadata in the image for enterprising readers desperate to know where the photo was taken (though it’s only a few blocks from the geocoded point above).
Code
Code
This is almost exactly what the results still show using 2021 ACS Data.
The conceptual argument for factorial ecology is that residential areas can be characterized by lots of different data points–hundreds of Census indicators if so desired–but after considering all these measures, the differentiation between neighborhoods can be characterized, almost entirely, by a small handful of representative dimensions. Destite some arguments over interpretation and application, dozens of replication studies agree with this basic premise, and as the scree plots above show, four or so factors capture nearly all the covariation in the blockgroup attributes. The empirical findings of factorial ecology are more or less uncontested.
The major critique of factor ecology is its inescapable rooting in human ecology, which for all of its contributions, views urban dynamics as ‘natural’ elements akin to biology. Obviously our understanding of structural inqeuality today regards that view as flippant, and that many of that spatial patterns we observe in cities today are the direct result of institutionalized racism and intentionally-designed public policies.
The kernel of genius apparent in factorial ecology, and our contemporary understanding of structural inequality are perfectly compatible, as long as we reorient the interpretation of the latent variables to represent “sorting factors” guided by political, economic and social forces, rather than outcomes from some natural ecological (or utility maximizing) process (Quillian, 2015). Class, race, age, and the characteristics of the built environment remain the primary ways that cities are organized, through a mixture of individual choices, market forces, and public policies.