23  Latent Variable Models

uncovering the social bases of urban differentiation. Factor analysis in urban studies comes from a long tradition grounded in sociology and geography. In this context, it is used as a method to study the underlying social structure it is not viewed as a technique for feature engineering in this work. Rather, it is an attempt to understand whether social systems differ in similar ways across different spatial contexts (e.g. comparing factorial ecologies), or testing whether a theoretical social construct is associated with an array of outcomes (ecometrics & structural equation modeling)

Following the introduction of BCZ, empirical models of spatial structure went largely undeveloped for a significant period. In the 1950s, however, social shifts yet again brought emphasis on the significance of place. During the postwar period of suburbanization, white flight, and social unrest, there was a significant push to understand the nature of “community;” the Chicago School focused strongly on how urban spatial segmentation leads to social behaviors like territorialism and consciously created “ideological communities” (Hawley, 1950; Hunter, 1975; Suttles, 1972).

To understand how these communities were created, scholars turned to newly developed statistical methods to identify the essential elements of “urbanism” that structure modern life and the field of social area analysis was born. This turn is often viewed as the beginning of an age of urban empiricism, but it is important to emphasize that a critical component of Chicago School analysis is empirical work that is grounded firmly in social theory (Sampson2002?). As researchers adopted new statistical techniques, therefore, they attempted to operationalize “the complicated phenomena of urbanism,” described by (Wirth1938?) as “a system of social organization involving a characteristic social structure, a series of social institutions, and a typical pattern of social relations”.

Seeking to formalize and operationalize the ideas of human ecology, neighborhoods, and social areas, researchers set out to define neighborhoods as a set of spatially structured social interactions. A neighborhood or “natural area” could then be identified as meeting the following criteria:

  1. a geographic area physically distinguishable from other adjacent areas;
  2. a population with unique social, demographic, or ethnic composition;
  3. a social system with rules, norms, and regularly recurring patterns of social interaction that function as mechanisms of social control; and
  4. aggregate emergent behaviors or ways of life that distinguish the area from others around it" (Schwirian, 1983, p. 84)

These ideas set the stage for an entire generation of researchers focused on discovering the latent spatial structure in social relations. What is particularly important about the framework Schwirian articulates is encapsulated in the last two bullets. The Chicago School and its devotees maintained a focus on human behavior, cultural norms, and assimilation, which they viewed as having a reflexive relationship with residential arrangements. Early approaches were therefore driven by a focus on identifying, isolating, and quantifying the social processes that led to territorialism, “defended communities”, and segregation, among other emergent behaviors (Suttles, 1972).

One of the earliest innovations in neighborhood empirical work was the notion that social processes, which are difficult to observe, could be treated as latent variables and modeled using easily obtained Census data, similar to the ways that psychologists were beginning to model unobservable personality traits in individuals. The first studies deploying this technique were known as Social Area Analyses (SAA), and were developed to help understand the shifting patterns of segregation and urbanization that began in the 1950s. Although social area analysis has been long studied and its lineage is well-known, it is important to remember that its early emphasis on natural science and social psychology generated a empirical search for the fundamental axes of community differentiation–the laws of social physics that described how segregation and city living restructured the life course.

23.0.1 Social Area Analysis

Social area analysis was first devised by Shevky & Williams (1949) and uses factor analysis to isolate and measure what the authors conceived as three essential dimensions of urban spatial structure:

  1. urbanization - measured by manifest variables fertility, women in the labor force, and single-family dwelling units
  2. social rank - measured by manifest variables occupation, educational attainment, and rent
  3. segregation - measured by an “index of isolation for selected ethnic and foreign-born groups” (Bell, 1953)

Together, Shevky and Bell postulated, these three constructs accounted for the majority of the differences between population groups living in the city. The Shevky-Bell hypothesis, as it is now known, holds that urbanization, segregation, and social rank are the defining forces that structure urban life and influence a variety of behaviors like household formation and participation in formal organizations (Bell, 1953; Spielman & Thill, 2008). Implicit in SAA is that these factors have theoretical connections to behavior. Urbanization, for example may lead to declining birth rates as women stop bearing children and join the workforce, and segregation may lead to predictable residential patterns as immigrants and ethnic minorities form enclaves for mutual benefit[^1].

Following their identification, the latent factors are used as input to cluster analysis, used to group neighborhoods into similar types. Describing Shevky’s original conceptual framework for SAA, Herbert (1967, p. 42) articulates a case that modern urban industrialism is characterized by unavoidable “changes in the distribution of skills, changes in the organisation of productive activity, and changes in the composition of population. Associated with these three main trends are the expressions of social differentiation which become more marked over time.

Thus, Shevky & Williams’s SAA specifies three goals: first, SAA specifies a quantitative framework for capturing these three essential dimensions of social transformation; second, SAA finds groups of spatial units (neighborhoods, in theory) that are similar along each of the three dimensions. By grouping the neighborhoods into categories, Shevky hoped to capture nonlinear dynamics that might result from the interaction of the three components. Third, SAA uses the resulting "social area" categories as lenses and explanatory variables for other urban inquiries (Brindley & Raine, 1979).

Shevky and Williams’s initial work focused on Los Angeles, and soon after it was published, Bell (1953) reimplemented SAA in San Francisco, using his results first to examine the generalizability of the original L.A. study, (Bell1955?) and later to study spatially stratified participation in organizations, and informal social relations in different neighborhood types (Bell & Boat, 1957; Bell & Force, 1956). A number of replications were also performed to test the stability of the Shevky-Bell hypothesis, and whether the same general structure appeared in other American cities, which it often did (Arsdol et al., 1958b; Arsdol et al., 1958a; Greer, 1956; Schmid, 1950; Schmid et al., 1958).

Despites some converging results from different cities, the replication studies often were contentious. Some objected to the use of SAA, arguing that it lacked foundations in social theory and, apart from interesting patterns, provided little insight into the causes and consequences of urban ills (Hawley & Duncan, 1957; Van Arsdol et al., 1961, 1962). In a particularly poignant critique, Hawley & Duncan (1957) question whether urbanization, segregation, and social rank are the defining characteristics of cities, and whether the social areas analysis of these variables provides valuable insight into urban life. Put bluntly, they argue that the ‘social area’ lacks scientific rigor because it “has provided no theory that explains why areas tend to be homogeneous or otherwise, or that predicts the degree of homogeneity to be observed” [ p.339].

Apart from methodological issues, some replications called into question whether the three factor structure was a sufficiently robust model to describe American cities. While there was general support for the SAA model, some results were mixed, particularly with respect to the strength and orthogonality of the three factors, leading Anderson & Bean (1961) to question whether the same factor solution would emerge using alternative variables and whether the factors would remain static in number and interpretation.

Anderson & Egeland (1961) probe the question in more depth, finding support for Burgess’s concentric zones theory in terms of urbanization but not with respect to social rank, and Udry (1964) argues that the factor solution is sensitive to the size of the spatial units. In the ongoing debate, even Bell & Greer (1962) conceded that while “there is clearly emergent and presumptive evidence of verified theoretical structure in the Shevky schema of urban analysis, additional specifications, elaboration, and formulation are necessary,” and as more human ecologists heeded his call, exploratory investigations of the factor structure of urban areas blossomed into their own subfield called “factorial ecology.”

23.0.2 Factorial Ecology

Through the 1970s a staggering number of Factorial Ecology (FE) studies appeared in the literature. Unlike its predecessor SAA, which tried to derive three theoretically meaningful constructs using factor analysis, then performed a cluster analysis on those axes to understand urban segmentation, FE is typically more open-ended. Instead, FE is an inductive approach that leverages exploratory factor analysis and eschews clustering. Similar to Anderson & Bean (1961), factorial ecology researchers are interested in how the urban social structure might be modeled if a more diverse set of variables were factored. Following, in FE many social variables are provided to a factor model and components emerge from the data. By examining which variables load strongly on which factors, researchers can intuit the conceptual interpretation of each component.

Conceptually, factorial ecology borrows from psychological personality research and psychometrics, modifying “factors of the mind” into factors of the neighborhood (Palm & Caruso, 1972). Berry (1971) and Rees (1971) develop “factor models” that borrow from psychological personality research and factor labeling in factorial ecology. Factor models are sensitive to the choice of rotation (oblique or orthogonal) and the estimation procedure used (Hunter, 1972).

Accordingly, many of the results that have emerged from studies of factorial ecology should be treated with skepticism until it can be shown that they are robust to the choice of factor model used (Newton & Johnston, 1976; Perle, 1979; Taylor & Parkes, 1975; Salins1971?). Others have criticized factorial ecology for lacking theory, arguing that it amounts to quantitative fishing, since any derived factor structure can be explained ex-post-facto.

Despite these criticisms, FE studies have been undertaken all over the world and its diverse applications were even featured in a special issue of Economic Geography (Berry, 1971). Scholars have canvassed a wide variety of continents, cultures, and class-systems, including Ireland (Parker, 1975) Sweden (Janson, 1971), the Middle East (Landay, 1971), India (Berry, 1971; Rees1969?), Canada (Bourne & Barber, 1971; Murdie, 1969), and Brazil (Morris1971?), in addition to Los Angeles, Chicago (Hunter, 1971), a wide variety of other American cities (Palm & Caruso, 1972), and more.

Despite this broad applicability–or perhaps because of it–Landay (1971) raises the issue of generalizability and cultural sensitivity when applying factorial ecology in different contexts. Of particular interest is the method by which variables are chosen. If the focus is on a “contextual mode of inquiry,” how much hyperlocal context needs to be embedded in the data, and how “standardized” might be the results? Perfect contextual data is infinitely nuanced by definition, and any attempt to distill a set of “standardized” results is, therefore, off the mark. Thus “if the goal is to make broad descriptive statements, factor analysis may be the appropriate technique, but if the goal is to make statements concerning relationships among specific variables of theoretical interest, correlation and regression methods would appear to be more appropriate” (Berry, 1971, p. 214)

23.0.3 Ecometrics

Following the initial excitement in factorial ecology during the 1960s and 1970s, the practice quickly fell out of vogue and lay mostly dormant through the 1980s and 1990s, presumably in part due to its inability to address methodological critiques. After this lull, however, explorations into the factor structure of communities were revived by Raudenbush & Sampson (1999) in a seminal article introducing a newly minted sub-field they term “Ecometrics.”

To overcome the problems of factorial ecology in the earlier generation, Raudenbush & Sampson propose several improvements to the methodology. First, they argue that while large scale databases like the US Census contain a variety of useful data, they typically fail to capture many of the most important ecological properties of neighborhoods. Instead, Raudenbush & Sampson advocate the use of item-response models tailored specifically to collect information about community structures.

In conjunction with the proposed data collection devices, they encourage the use of confirmatory factor analysis (CFA), as opposed to the exploratory factor analyses (EFA) employed by factor ecologists. Confirmatory factor analysis is a special case of structural equation modeling which, unlike its exploratory cousin, allows social scientists to test a-priori theories about factor relationships by specifying a measurement model that describes how particular variables should load onto designated theoretical latent constructs; in so doing, CFA provides an inferential framework for testing whether the social construct under consideration is supported by the data.

In this way ecometrics is a marked departure from factor ecology (and arguably a return to Chicago School ideals); whereas the latter is concerned with exploratory urban research, using diverse datasets to examine emergent factors and developing post-hoc interpretations of them, the former is concerned with deductive research. A theory about why certain variables are presumed to load into semantically-meaningful factors is stated formally, arguments justifying this are made, and then statistical tests of fit are performed to interrogate these claims (Mujahid et al., 2007).

Ecometrics is still a fledgling methodology, but it has already been shown capable of developing valid and reliable measures of social constructs like collective efficacy, physical disorder, and social disorder, which have important implications for human behavior (Raudenbush & Sampson, 1999; Sampson & Raudenbush, 1999; Sampson2002b?; sampson2012great?). The predominant barrier to adoption in ecometric research has been the costly requirement of systematic social observation (Sampson & Raudenbush, 1999).

Part of the push for ecometrics was that large-scale administrative data (e.g. the Census) often lack information about the most important ecological properties of communities, and thus novel (and expensive) data collection strategies are necessary. Recently, however, researchers have attempted to make ecometric research more accessible by incorporating “big data” sources and “virtual audits” (O’Brien et al., 2015; Sampson, 2017; Bader2015a?; Bader2017?). It seems likely that ecometric analysis will continue to grow in popularity, particularly as additional datasets and new applications materialize.

Meanwhile, however, ecometric studies are relatively rare, and the more common practice, by far, is the development of neighborhood classifications and typologies. Instead of factor analysis, these studies employ cluster analysis to identify groups of neighborhoods (i.e. census tracts) whose racial, economic, physical and other attributes are internally homogeneous. This approach is discussed in detail in the following section.

23.1 factor analysis and latent constructs

“Unfortunately, many of the theoretical distinctions between factor analysis and components analysis are lost when one of the major commercial packages claims to be doing factor analysis by doing principal components. Basic concepts of reliability and structural equation modeling are built upon the logic of factor analysis and most discussions of theoretical constructs are discussions of factors rather than components. There is a reason for this. The pragmatic similarity between the three sets of models should not be used as an excuse for theoretical sloppiness.”

Barrett (2007), Finch (2020), Gore & Widiger (2013), Hengartner et al. (2014), Irwing (2018), Lawley & Maxwell (1962), Meredith & Teresi (2006), O’Connor (2002), Osborne (2019), Revelle & Wilt (2013), Revelle (n.d.), Velicer & Jackson (1990)

23.2 factorial ecology (unsupervised/EFA)

Berry (1971), Hunter (1972), Murdie (1969), Newton & Johnston (1976), Palm & Caruso (1972), Parker (1975), Perle (1979), Berry & Rees (1969), Rees (1971), Taylor & Parkes (1975), Timms (1970)

Clark et al. (1974), Johnston et al. (2004), Newton & Johnston (1976), Silber (1989), Steiger (2009), Wirth & Edwards (2007)

Abu-Lughod (1969), Bell (1954), Berry & Spodek (1971), Bourne & Barber (1971), Golden & Earp (2012), Goldstein (2018), Hawley (1950), Landay (1971), Park (1952), Perle (1979), Perle (1983), Schmid (1950), Schmid et al. (1958)

23.3 ecometrics (supervised/CFA)

Mujahid et al. (2007), O’Brien & Wilson (2011), O’Brien & Kauffman (2013), O’Brien et al. (2015), Raudenbush & Sampson (1999), Sampson (2017)

Hipp (2010), Kuipers et al. (2012), de Leon et al. (2009), Ross & Mirowsky (2001), Sampson & Raudenbush (1999), Sampson & Raudenbush (2004), Sampson et al. (2019), Stockdale et al. (2007)

23.4 Software

  • geopandas
  • scikit-learn
  • factor_analyzer
  • geosnap

:::