26  Social Areas & Discrete Categories

identifying “neighborhoods” and neighborhood types. The focus in this section is not on what the algorithms are doing, but on what the fitted solution tells us. We look at how to get a good summary of an urban area, and how to use that summary as input to further analyses.

While he doesnt use the term geodemographics, an excellent description of the relationship between social structure and spatial structure is given in Morgan (1984)

“pure” geodemographics includes only aspatial data like sociodemographics. Nevertheless, because of the underlying muiltivariate spatial autocorrelation, geodemographic types tend to cluster together in space. From an urban economic perspective, this tells us that these groups (identified by the cluster model) tend to consume the same bundles of housing and neighborhood attributes. In a perfect spatial market, Tieboutian theory says these clusterings represent people’s neighborhood choices, and the fact that they cluster in space tells us that “birds of a feather flock together” as geodemographic practitioners like to say.

A more critical perspective recognizes that these patterns reflect constrained choices, limited by factors like housing market discrimination, subgroup differences in purchasing power or market information, the unequal distribution of housing supply (e.g. the best school districts rarely have high rise apartments that drive costs down), and the hoarding of opportunity (e.g. by exclusionary zoning that prohibits intentionall the construction of that very apartment tower). The resulting geodemographic map, then is very obviously emergent from the interconnected social, cultural, economic, and political systems in the region under study.


Whereas the early spatial scientists were interested in explaining the relationship between social processes and spatial structure, another pervasive interest has been describing the emergence of a particular spatial structure when considering certain socioeconomic, demographic, or behavioral variables. Indeed, “in geographic knowledge discovery the aim is, more often than not, to explore and let spatial patterns surface rather than develop predictive models” (Henriques2012?).

This approach represents an important conceptual shift from the factor analytic approaches discussed in Section 2.2.2. From a procedural standpoint, both factor analysis and cluster analysis are data reduction techniques sometimes described as “unsupervised machine learning”; whereas factor analysis and geodemographics create composite indices that maintain the greatest amount of information or variance. Thus, factor analysis and geodemographics require a ‘speculative synthesis’ when determining the meaning of the latent variables. For factor analysis, this requires determining the meaning of loadings; for geodemographics, this requires identifying a meaningful demographic profile for each geodemographic classification (Spielman & Thill, 2008). As a result, meaning is synthesized from statistical profiles, instead of generated at the outset through a theory or hypothesis about urban social structure.

Thus, some expressive analysis methods are deductive, seeking to develop theory and test hypotheses about urban ecological processes (e.g. ecometrics). Others focus on inductive analysis, exploring the multitude of ways that urban segregation manifests without specifying the axes of differentiation (e.g. geodemographics). As ecometrics modernizes factorial ecology, so too does geodemographics modernize social area analysis. In this sense, geodemographics reorient social area analysis away from sociological models of spatial structure to geographic ones.

The sociological line of inquiry is arguably about location choice: why do different groups of people come to inhabit discrete parts of the city? It is also concerned with social process: how do spatial contexts (and the social systems that develop within them) influence collective behaviors like altruism or crime? In addressing these questions, urban sociologists wanted to know if the same factor structure emerged in different places and different societies–if so, it would represent anthropological evidence that human social and political behavior is influenced by some kind of natural laws.

By contrast, the geographic line of inquiry is arguably about “location intelligence”: how different are any two neighborhoods, based on the estimated profiles of their residents? What can we learn about cities by studying how residential areas are split by different classifications? In both cases, geography is an expression or manifestation of the underlying fundamental process at hand, rather than an object of study itself.

Over the last several decades, the practice of geodemographics has become a common avenue of academic study and a lucrative enterprise for private industry market research. Indeed, “the analysis of people by where they live” (Petersen et al., 2011, p. 174) has become the dominant approach for characterizing how socio-spatial structure is expressed in urban areas. Much like BCZ, geodemographic approaches “organize areas into categories sharing similarities across multiple socioeconomic attributes” (Singleton & Spielman, 2014, p. 558). The distinguishing difference between SAA and geodemographics is that the latter does not employ factor analysis prior to clustering.

Thus, rather than describing the essential components of urbanism, geodemographic classifications are themselves “small area indicators of the social, economic and demographic conditions prevailing in small areas, or ‘neighbourhoods’.” As statistics themselves, these classifications can flexibly incorporate any kind of urban data (Singleton & Longley, 2009b, p. 289). This flexibility means that geodemographic approaches can be tailored to a wide variety of purposes, but also raises the challenge in “substantiating that they reflect real divisions in society, not chance grouping in the data” (Singleton & Spielman, 2014, p. 563).

Geodemographic segmentation systems have been applied with success to a wide variety of practical settings including public health (Abbas et al., 2009; Farr & Evans, 2005; Petersen et al., 2011), education (Harris et al., 2007; Singleton et al., 2012; Singleton2009a?), criminal justice (Ashby & Longley, 2005), marketing (Dalton & Thatcher, 2015), road safety (Anderson, 2010; Brown et al., 1999), urban microsimulation (Birkin & Clarke, 2012), and several others (Singleton & Spielman, 2014). In the realm of public policy, (Webber & Burrows, 2018) show how the city of Liverpool has been using geodemographics for decades to develop better urban plans, and (Batey2007?) develop a geodemographic method for assessing whether government initiatives are serving adequately their intended spatial targets. In the private sector, meanwhile, geodemographic systems like MOSAIC and ACORN have flourished over the last several decades, enabling marketing and financial service providers to better target customers using geodemographics to model customer behavior (Farr2001?). Towards this end commercial products have proven enormously powerful and consume voracious amounts of data through partnerships with aggregators like Experian and other financial vendors (Webber & Burrows, 2018).

26.1 Urban Regional Science: Embodying Urban Contexts

All expressive methods seek to analyze how urban space expresses social difference, which is done either by identifying the distinct effect neighborhoods have on their inhabitants or by estimating unique classifications/demographic profiles that apply to demographic areas. In contrast to this, the embodied approaches of urban regional science take urban space as constitutive of social difference. Instead of identifying how neighborhoods are divided by sociodemographic structures, spatially-coherent neighborhoods are constructed that embody these divides. Alternatively, instead of estimating the effect of context on its inhabitants, the shape of context itself is distilled from its inhabitants. Thus, whereas expressive methods use geography as a medium to express social structure, embodied methods identify coherent geographies latent within sociodemographic structure.

What a “coherent” geography means, though, requires the core analytical concept of regional science, the “region.” In urban regional science, a “region” is a spatially-bounded territory that stands in for a conceptually- or mathematically-relevant target of analysis. Since regions are “spatially-bounded,” they are usually exclusive (meaning that observations can only be in one region) and exhaustive (all observations are in at least one region). Thus, regions completely partition the urban space under study. Their use (and thus, relevance) depends on the context being studied (Openshaw1977?).

With respect to identifying neighborhoods, regionalization methods operationalize (Galster2001?), finding “bundles of spatially based attributes associated with clusters of residences.” But, urban regional science approaches focus on more than neighborhoods alone, allowing for meaningful “bundles of spatially based attributes” that pertain to a wide variety of distinct urban locations (residences, but also workplaces, commute paths, leisure spaces, etc.). Focusing exclusively on urban regional science about neighborhoods, then, bounding these coherent bundles of spatially based attributes identifies how relevant social processes are embodied within urban space. The methods, techniques, & common operational theories used to estimate these boundaries are called “regionalization.”

Distinct from “clustering,” regionalization requires the partitioning of a map into a finite number of exclusive labellings. Map clustering seeks only to identify unusual regions, even those that are geographically irregular (Kolatch2001?) or do not provide an exhaustive partition.[^2] Thus, clustering is a fully “unsupervised” analytical technique, whereas regionalization is often described as semi-supervised. Generally-speaking, the analyst has a notion of how many regions are desired, geographical conditions the regions should satisfy (such as compactness, convexity, and/or contiguity), and which implicit geographies the detected regions might echo (Duque2007?). However, in nearly all cases, the recovery of an existing neighborhood geography is not the end of regionalization, so it is not a strictly supervised technique.

Beyond the core unifying concept of the embodied “region,” regionalization methods have a much wider and diffuse set of applications & techniques. Because regional delineations are strongly dependent on how the process plays out in space, regionalization methods themselves usually do not relate to specific hypotheses about social systems. Instead, regionalization involves a large set of broadly useful methods for partitioning geographies. This can make the literature on regionalization appear more diffuse than the expressive methods discussed previously.

However, this diffusiveness is a necessary companion of maturing geographic perspectives (Johnston2018?); there are few grand “geographic theories” in the same sense as those considered by the Chicago School, only specific theories about the geography of each social process. In light of this, we present the regionalization in the following section by identifying commonality in both methods & applications. For each case study we discuss, we identify common regionalization strategies, examine shared conceptual entities that regions are used to represent, and describe how these embodied geographies might relate to other studies’ geographies.

26.1.1 Regionalization methods

In their review of regionalization algorithms, (Duque2007?) identify five criteria used for drawing regions. They describe the various conditions governing how “areas,” the fundamental units of observation being grouped, are usually grouped together into the “regions” defining a regionalization. Below, we name and paraphrase the conditions suggested (Duque2007?) that regionalizations tend to satisfy:

  1. exclusiveness - observations are in at most one region
  2. exhaustiveness - observations are in at least one region
  3. fullness - each region has more than one observation
  4. disjunction - each region has a distinctive geographic location and does not overlap or blend into another
  5. optimality - the regionalization is designed to score well according to a formally-specified objective

Thus, a regionalization algorithm usually provides a full, exclusive-exhaustive partition of a source graph (designed to represent the urban geography under analysis) into many distinctive parts. Taken together, these subgraphs satisfy some target goal or objective; this objective might be explicitly spatial, purely sociodemographic, or may reflect a mixture of any number of component objectives.

26.1.2 Fully-Exclusive Regionalizations: bounding the neighbourhood

Work on the fundamental theory of how best to conduct regionalization analyses, in general, is active and ongoing (Folch2014?; Laura2015?; Kim2016?; She2017?). Although regions are sometimes required purely for statistical purposes (Openshaw1977?; Spielman2015b?), regions are often used to model urban residential markets (Royuela2013?), social communities (Hipp, 2010; Hipp2012a?; Hipp2013?), political communities (Morrill1976?; Guo2008?; Pang2010?; Tam2016?; Magleby2018?), disease clusters (Assuncao2006b?), and transit zones (Guo & Bhat, 2007; Li2014?; Chen2015?). [^3]

Depending on the social process under study and the frame of analysis, these may be larger or smaller than other common notions of how “large” a neighborhood is from the perspective of the expressive literature (Spielman2015?). Often, these analyses compare the identified data-driven regions to an existing regionalization, identifying how and where the the solutions agree or examining which observations tend to be ill-fitting. This means that many analyses consider the number of observed regions as if it reflects a “known” or true number of admissible regions. This is not a necessary constraint (Duque2012?), however, and the number of admissible (or intelligible) regions has itself be used to analyze volatility in neighborhood dynamics (Rey2011?) or to provide more useful statistical summaries of small-area estimates (Bacao2004?; Henriques2010?; Assuncao2006?; Spielman2015b?).

26.1.3 The Fuzzy Urban Region

In geoscience & nature geography, many regionalizations have allowed for classifications which are not strictly disjoint (Bourgault1992?; Leyk2007?; Long2010a?; Yuan2015?). In these cases, it is reasonable to consider the regions being embodied as only partially-identifiable. Ecological or geological zones may reasonably blend smoothly into one another, creating spaces where samples might plausibly fall into more than one cluster/region. Only some of their bounds, edges, or extents are discernible, mainly where the difference in empirical characteristics between regions is largest.

However, by dint of constraint, classical regionalization methods may force these partially-identified regions into being complete exclusive-exhaustive assignments. This is akin to some of the concerns discussed by (Spielman2015?); uncertainty both about which region a site ought to fall into and uncertainty about the site itself may affect classifications across the board. What (Kwan2012?) refers to as the “Uncertain Geographic Context Problem”–this fundamental epistemological uncertainty about the scale and precise hierarchy at which theorized regions affect observations–is an intrinsically difficult representational problem.

In general, since one cannot know the “true” regions that individuals find most relevant or most impactful for a given social process (or combination of processes), misspecification of the relevant regions may result in statistical or empirical artifacts; observations may be assigned the incorrect contextual effect, multiple contexts may act jointly and their effects are not identifiable, observations may be mis-assigned and thus bias an existing contextual effect estimate away from its “true” value were the set of all regions known.Indeed, this is a fundamental concern: (Isard1956?) identifies this problem right from the outset of urban regional science. He notes, “[regional scientists] shall probably never be in the position to identify a ‘true’ set of regions,” so they are forced to use a new purpose-driven regionalization for each distinct interrogation.

Thus, the “true” context, in Isard’s view, was likely to remain uncertain. However, through repeated study, commonalities in the structure of relevant regions would emerge, possibly leading to regions which minimized the extent to which they obscured the social processes they co-constituted. While these are just now coming within reach for advanced statistical studies (Bradley2017?), the extent to which these regions represent intelligible socially-experienced geographies is currently unknown. Thus, while some analyses do aim to critically consider uncertainties and measurement (Harris et al., 2007; Gale2013?; Singleton2016?; Knaap2017?), practical consideration of the uncertain structure of urban regions in this literature is surprisingly rare given the issue’s longstanding theoretical attention.

Beyond uncertainty in classification, the inherent rigidity of assuming regions are disjoint, which means that the identified zones may be more separated or distinguished geographically than they may be in theory. As some of the hierarchical and spectral methods note, classifications need not be strictly disjoint; indeed, it is often reasonable to think that regions or neighborhoods may have “fuzzy” boundaries. This uncertainty of boundary is distinct from uncertainty in classification or measurement; if regions are useful insofar as they identify a distinct territory, then blending or interlacing assignments at the boundary may denote areas where a single region assignment may not be useful or accurate.

There are a few attempts to generalize these concerns in classic regionalization methods, either by considering membership itself as a fuzzy decision (Ambroise1986?; Ambroise1998?; Cowpertwait2011?; Hu2009?; Reich2011?) or by allowing a component assumption of disjointedness to be relaxed (Spielman2013b?; Yuan2015?; Wolf2018?). For instance, in (Spielman2013b?), street segments are classified into ethnic categories using historical census data. However, classifications are not exhaustive, in that some streets are not identified as being of any discernible ethnic category.

Further, “neighborhoods” are loosely-bounded, allowing for the intermingling of classifications into the same bounded space. However, in these studies, neighborhoods qua regions still represent a single zone, albeit less crisply-bounded; in the case of social applications, these zones reflect shared contexts that are experienced by many individuals in a shared socio-spatial urban geography. It is only the assumptions about regional structure—that they must be exclusive, exhaustive, and disjoint—that have been relaxed.

26.1.4 Rejecting the Neighborhood-as-Region

There are also outright rejections of exclusiveness, exhaustiveness, or disjunction. In the main, these rejections of neighborhoods as regions are theoretically motivated. They may suggest that only a subset of boundary/transitional areas is well-defined. These bounding approaches focus exclusively on identifying zones of rapid change rather than providing membership into discrete categories (in a processes referred to as “wombling”) (Womble1951?; Bocquet1994?; Lu2007?; Dean2018?). Another rejection of classical region assumptions in urban regional science involves the rejection of shared context. Here, “egohoods,” or spaces of individual/personal experience (Hipp2013b?), are used instead.

These spaces of individual experience tend to overlap significantly, are usually unique for each individual (Spielman & Thill, 2008; Spielman2009?; Logan2011?; Spielman2013?) although the characteristics of these spaces can change dramatically depending on how large they are (Fowler2016?). Because the egohood is purely theoretical and obtained usually from straightforward computations applied to individuals’ locations, “bounding” an egohood (or shared spaces between egohoods) is not conceptually useful in the same manner as for the region. Thus, it is unusual to consider the co-incidence of egohood boundaries, even though this is a common method of analysis when interviewing individuals about what they perceive their neighborhoods to be (Coulton2001?; Campbell2009?; Coulton2013?; Hwang2016a?). This makes egohoods conceptually and practically distinct from neighborhoods since they (in general) do not pertain to collections of residences (or transit destinations or sales locations, etc).

  • market areas (housing submarkets)
  • neighborhood types
  • unique neighborhoods

26.2 Social Area Analysis

Abu-Lughod (1969), Anderson & Egeland (1961), Bell & Greer (1962), Brindley & Raine (1979), Brown & Horton (1970), Cullingford & Openshaw (1982), (Fone:IntJEpidemiol:2007?), Han & Lee (2013-11-31), Hawley & Duncan (1957), Herbert (1967), Li & Shanmuganathan (2007), Nethery et al. (2019), Shevky & Williams (1949), Shevky & Bell (1955), Spielman & Thill (2008), Tryon (1955), Anderson & Bean (1961), Arsdol et al. (1958)

26.3 Geodemographics

Batey et al. (1995)

Abbas et al. (2009), Adnan et al. (2010), Anderson (2010), Ashby & Longley (2005), Batey & Brown (2007), Birkin & Clarke (1998), Birkin & Clarke (2012), Brown & Batey (1994), Brown et al. (1999), Brunsdon et al. (2010), Burns et al. (2018-09-31), Burrows & Gane (2006), Cockings et al. (2020), Dalton & Thatcher (2015), De Sabbata & Liu (2019), Farr & Evans (2005), Flowerdew & Goldstein (1989), Grekousis (2020), Harris et al. (2007), Longley (2005), Longley (2012), Major et al. (2018), Petersen et al. (2011), Singleton & Longley (2009a), Singleton et al. (2016), Singleton (2004), Singleton & Longley (2009b), Singleton et al. (2012), Singleton & Spielman (2014), Singleton & Longley (2019), Somashekhar (2020), Voas & Williamson (2001), Webber & Burrows (2018), Xiang et al. (2018)

27 Software

  • geosnap
  • spopt
  • scikit-learn

:::