flowchart LR choice{Should I use GWR?} answer(No) choice --> answer
28 Applied Spatial Econometrics
Spatial econometrics has its origins in the early 1970s, when Jean Paelinck used the term to refer to methodological aspects associated with incorporating dependence in cross-sectional multiregional econometric models. Initially, the development and application of spatial econometrics was mostly driven by the interests of regional scientists and applied economists in Europe, and several of the early classics appeared in RSUE. In part stimulated by advances in theory (social and spatial interaction) and technology (geographic information systems), the interest in spatial analysis in economics and other social sciences has seen tremendous growth in recent years (Goodchild et al., 2000). This culminated in the formal establishment in May 2006 of an international “Spatial Econometrics Association” at the Fifth Workshop on Spatial Econometrics and Statistics in Rome, Italy.
– Anselin (2007)
Spatial econometrics is the branch of spatial statistics concerned primarily with inference over prediction. The focus of the models is to obtain consistent and efficient estimation of model coefficients in the presence of two spatial effects: spatial dependence and spatial heterogeneity (Anselin, 1988a; Anselin, 1989). Spatial dependence arises when observations in nearby locations are related to one another (violating the independence condition necessary for regression analysis). This could be because (1) they are both affected by the same (possibly unobserved) process, (2) because they exert direct interactive pressure on one another, (3) they affect one another indirectly or endogenously, or (4) some combination thereof. Spatial heterogeneity arises when the relationship between two variables is non-constant over space. This is a special case of heteroskedasticity where the relationship between X and Y is spatially-patterned.
In the causal inference literature these issues are treated as “interference” that can confound causal estimates. While unbiased estimation of the model parameters is one longstanding goal, the spatial interaction mechanisms are also substantively interesting and important in spatial science. That is, the interaction between units is often viewed as a meaningful process like spatial spillover, rather than an interfering pathway between cause and effect. This means that in many cases we do not want to rid the model of spatial interaction, but specify its form and study its implications. We often have some idea what this interaction structure could look like, for example, we expect nearby units may interact with one another leading to a distance or connectivity-based notion of spatial interaction.
As such, what sets spatial econometric models apart from other models is the inclusion of neighborhood characteristics linked to each observation through a spatial connectivity graph, canonically called the spatial weights matrix \(W\) which describes how nearby observations might be expected to interact with one another. These potential interactions can take several forms leading to a variety of model specifications. The canonical citations for econometric model specification are Anselin & Griffith (1988), Anselin (1988a), and Anselin (1988c); the modern reference texts today are LeSage & Pace (2009), and Elhorst (2014) with poignant summaries for specifying, estimating, and interpreting spatial econometric models given in Elhorst (2010), Elhorst et al. (2012), LeSage & Pace (2014), and LeSage (2014a).
28.1 Motivation
The raison d’etre for spatial econometrics is simple. As we saw in Chapter 8 (and as Tobler’s refrain reminds us), empirical data often show high levels of spatial autocorrelation, and if that spatial structure is ignored it leads to problems in the analysis. So the first test in the toolbox was straightforward: take a standard OLS regression, then run a Moran’s I test on the residuals. If the test is significant, then there is evidence the observations interact with one another in space and the model has an obvious problem.
Spatial autocorrelation is not mentioned at all of in 12 currently used econometrics textbooks. …spatial econometrics has not yet appeared in any serious way in the econometrics literature. The recent textbooks in econometrics mentioned earlier contain virtually nothing on ways to handle spatial data. This implies that all economists being educated today must learn about dealing with spatial models and data outside of their curricula. At the same time, however, economists are more concerned than ever about environmental and urban problems. No wonder the literature on such subjects is only lightly sprinkled with analyses of spatial effects and the use of the spatial autocorrelation concept.
– Getis (2007)
The origins of spatial econometrics are arguably based on an experimentalist framework, where geographers and regional scientists have long recognized the ways that spatial dependence may bias traditional regression results (Anselin & Griffith, 1988; Cliff & Ord, 1969; Cliff & Ord, 1973; Cliff & Ord, 1970). This is a long precursor to more recent work on intervening variables and the “reflection problem” in the neighborhood effects and epidemiology literature (Halloran et al., 1991; Halloran & Struchiner, 1995; Manski, 1993; Sobel, 2006).
But theoretical spatial econometrics is also explicitly structural, as it attempts to codify what Anselin (1988a) refers to as a “model based” approach (as differentiated from the “data-based” approach). Despite the primary statistical motivation to recover unbiased estimation techniques, the models are introduced and discussed via a formal DGP that motivates each model’s use, and derives its functional form (LeSage & Pace, 2009). Although spatial econometric models can be viewed as a method for dealing with spatially-correlated unobserved variables, a key motivator behind many early developments was estimation of the structural spillover terms, which are often the key variables of interest in spatial science.
The word “structural” is used in the economic sense of a model with structural parameters that correspond to a microeconomic process that determines macroscale outcomes, e.g., the parameters of a land developer’s cost function or of a household’s demand function that influence the resulting market prices. Structural models are akin to what ecologists call “process-based” models and are distinguished from “pattern-based” models.
— Irwin (2010) p.69
In practice, however, most applied work adopts a data-driven approach that selects a preferred model based on fit criteria, and uses that selection to defend against criticisms of inefficiency/bias from OVB or autocorrelation that would otherwise plague an aspatial model. As McMillen (2012) and Gibbons & Overman (2012) describe, this approach trades one identification error for another, and often ascribes significance to spatial effects that may be spurious. Luckily, there are cases when both fully structural approaches and experimentalist approaches can lead to better, more appropriately-identified hedonic (and other) models, but constructing them requires considerable knowledge about housing markets, urban dynamics, and spatial relationships (Elhorst, 2010).
Thus, for applied spatial econometrics it is the best of times. And it is the worst of times. The vast majority of mainstream social science literature focused on causal inference still ignores any potential for spatial misspecification despite heavy reliance on georeferenced data. For example a recent stream of research by leading econometricians on the closure of abortion clinics—focused explicitly on spatial distances—fails to ever consider potential spatial effects (Lindo et al., 2017; Myers, 2021; Quast et al., 2017), while work on spillover effects in housing research fails to operationalize spatial spillover (Voith et al., 2022). Despite professional organizations and scholarly journals focused on spatial econometrics, Much of the mainstream literature in policy analysis and applied econometrics still fails to heed Isard (1949)’s warning, even today:
Theoreticians of today are chiefly preoccupied with introducing the time element in full into their analyses, and the literature abounds with models of a dynamic nature. Yet who can deny the spatial aspect of economic development:that all economic processes exist in space, as well as over time? Realistically, both time and space must be vital considerations in any theory of economy. Unfortunately, however, aside from those of the monopolistic competition school of thought, particularly Chamberlin, the architects of our finest theoretical structures have intensified the prejudice exhibited by Marshall. They continue to abstract from the element of space, and in doing so they are approaching a position of great imbalance.”
At the same time, however, applied spatial econometricians have been aptly criticized for treating every problem with a spatial component as a nail to be pounded with their proverbial spatial autoregressive hammer. This mechanical application of applied spatial econometric modeling without serious consideration of identification or structural estimation issues has led to a large volume of papers that ascribe causal spillover effects in research designs where these same effects are questionably identified (Gibbons & Overman, 2012). What’s more, misspecification and misinterpretation of spatial models and spillover effects is also rampant (LeSage, 2014a; LeSage & Pace, 2014).
Paradoxically, we need more research that considers formal spatial relationships and we need less blind application of spatial econometric methods. In this chapter, we use the hedonic housing price model (Blomquist & Worley, 1981; Rosen, 1974; Witte et al., 1979) as a vehicle to explore these issues in depth.
28.2 Regression Models with Spatial Effects
In a seminal contribution, LeSage & Pace (2009) differentiate between two scales of spatial dependence (local vs global), and two substantive varieties (spillover vs diffusion). Spillover occurs in a situation where “changes to explanatory variables in region \(i\) impact the dependent variable values in region \(j\)” (LeSage & Pace, 2014, p. 1537), whereas diffusion occurs when a shock to region \(i\) affects the disturbances of regions that neighbor observation \(i\). Local effects occur when spillover or diffusion only falls on the immediate neighbors of unit \(i\), whereas global effects occur when spillover/diffusion also falls onto neighbors-of-neighbors (including back to unit \(i\) itself in a feedback loop). Local and global spillovers and diffusions are incorporated into regression models using a variety of specifications discussed below. In all cases, \(W\) refers to the spatial connectivity graph (spatial weights matrix) the specifies the “neighborhood” of each unit.
28.2.1 SLX Model
The SLX (“Spatial Lag of X”) model is the simplest spatial model, and it does not require any specialized estimation techniques other than OLS. This model allows for local spillovers, and assumes the price of a unit is a function of the unit’s exogenous characteristics, the exogenous characteristics of other units in the neighborhood, and a random error term. \[ y = \beta X + \theta WX + \epsilon \] The SLX model is nice because it can incorporate local spillovers and requires only OLS estimation of the coefficients (which are interpretable as usual). This is the general starting place for incorporating spatial effects (Halleck Vega & Elhorst, 2015). In this model, \(\beta\) measures how changes in the property itself affect its selling price (the “direct effect”), while “the coefficient \(\theta\) measures how changes in neighboring properties’ characteristics impact the value of a typical property (on average over the sample).” (LeSage & Pace, 2014, p. 1541) (the “indirect effect”).
28.2.2 Spatial Lag Model
The Spatial Lag Model (sometimes called the spatial/simultaneous autoregressive or SAR model) allows for global spillovers in the dependent variable. Here we assume the price of a unit is a function of neighboring prices, exogenous characteristics, and random error. We use the spatial lag model and the spatial Durbin model (explained below) when endogenous spillovers are of interest (or when there is reason to believe there are feedback mechanisms in the dependent variable). In the case of home prices, there is no solid consensus among urban economists about whether prices demonstrate global spillover, but there is sound rationale for intuiting such process via price speculation and game theoretic behavior (LeSage, 2014a; LeSage & Pace, 2014). The simple fact that a home sells for a large price tag nearby sends information that my location may have increased in value. That is, the price itself, apart from any other characteristics of nearby properties has a spillover effect in its own right. The spillover term in this case is pure space, and we can consider it part of the land value.
\[ y = \rho Wy +\beta X + \epsilon \]
By definition, this is no longer a linear model because \(y\) is partially endogenous. We can still estimate the average \(\beta\)s, but it becomes more difficult to explain this model. The coefficients are no longer interpretable as normal regression coefficients (because the model is non-linear), so proper interpretation of the results requires computation of the marginal effects (and dispersion estimates thereof) (Elhorst, 2010; LeSage & Pace, 2009). For this reason, it can also be difficult to generate predicted values. Computation of marginal effects is possible in spatial econometric packages in Python, R, MatLab and Stata, but discussion of marginal effects is still not widespread in the literature (LeSage & Pace, 2014).
28.2.3 Spatial Error Model
The spatial error model (SEM) does not assume that prices are endogenous, but rather that an unobserved spatial process causes spatial correlation among units. This is a model of global diffusion, and effectively treats spatial structure as a nuisance to be filtered away. The SEM is sometimes applied as a way of controlling for spatially-correlated omitted variables (though it is not effective at removing OVB, since it controls only for a specific spatial structure, so it is discouraged from that perspective).
\[ \begin{gathered} y = \beta X + u,\\ u = \lambda Wu + \epsilon \end{gathered} \]
The spatial error model is appropriate when you do not expect a “real” endogenous spillover process and you intuit that any residual spatial autocorrelation is a result of some higher-order geographic process. In those cases, this is an attractive model because (a) the coefficients can be interpreted as usual, and (b) it is straightforward to extract predicted values. The downside of this model is that if there is an endogenous spillover process, then the cumulative effects may be improperly estimated because they do not account for feedback effects that can only accrue through the \(\rho WY\) term.
28.3 Including Multiple Spatial Effects
The spatial “Durbin” models combine the SLX specification with either the spatial lag or the spatial error specifications. This allows differentiation of the “direct” spillovers from the exogenous characteristics of nearby observations as well as the “indirect” spillover that accumulates through the autoregressive process. This allows us to parse the differences between local spillovers in each exogenous variable (\(\theta WX\)) from the global effects induced by the autoregressive process (in either the dependent variable or the error). The two Durbin specifications are the recommended models for applied work because they provide the greatest generality while maintaining identifiability for all parameters of interest.
28.3.1 Spatial Durbin Model
The Durbin model (SDM) extends the spatial lag model by also allowing for local spillovers in the exogenous variables. Here, price is a function of (endogenous) neighboring prices, exogenous characteristics of the unit itself, exogenous characteristics of nearby units, and random error. The Durbin model includes both local and global spillovers, and in many cases is the “ideal” specification from an applied hedonic perspective.
\[ y = \rho Wy +\beta X + \theta WX + \epsilon \]
As with the Spatial Lag Model, the SDM is non-linear and requires estimation of marginal effects (and estimates of their dispersion) to properly interpret the coefficients (and their significance). A key benefit of this model over the spatial lag model is that the spillover effects can have a different sign than the direct effects (Elhorst, 2010; LeSage & Pace, 2014). For example having a tall building on a parcel may increase its value, but being surrounded by tall buildings could decrease its value. This is the major improvement of including the \(\theta WX\) terms, however, the coefficients still cannot be interpreted directly as spillovers (as in the SLX and SDEM models), but instead the model requires computation of marginal effects for understanding “direct” and “indirect” (or spillover) effects.
28.3.2 Spatial Durbin Error Model
The spatial Durbin error model (SDEM) allows for local spillovers and a global error diffusion process. Thus, it includes the exogenous from nearby observations and a spatially correlated error term (but no endogenous spillover in the dependent variable). The Durbin Error model is the recommended approach when (a) there is no substantive interest in feedback effects or endogenous spillover and (b) there is no reason to suspect that (prices, in this case) could be autoregressive (if they are, then SDEM would be misspecified and SDM should be fit instead) (LeSage, 2014a).
\[ \begin{gathered} y = \beta X + \theta WX + u, \\ u = \lambda Wu + \epsilon \end{gathered} \]
Thus, the SDEM is an attractive model from an interpretive perspective because it does not require computation of marginal effects; there is no \(y\) variable on the right-hand side, so the model is still linear, and the coefficients have the usual interpretation. While the SDEM may be unfamiliar to people outside spatial econometrics, its parameters are straightforward compared to SAR or SDM models. As LeSage & Pace (2009, p. 1541) describe, “For the SDEM and SLX models, the coefficients in the vector \(\theta\) represent local spillovers, since there is an impact only on immediately neighboring observations. We note that estimates from these two models should be similar, but in the face of spatial dependence in the disturbances, SDEM model estimates should be more efficient.” The dispersion estimates for SDEM can be estimated readily by standard spatial econometric software.
28.4 Others
Apart from these five model specifications, there are other common spatial econometric model specifications, as well as other ways to incorporate spatial relationships into a regression framework. In general, these approaches are less appropriate for land value modeling due to reasons discussed briefly below.
28.4.1 General Nesting Specification and Manski Model
Technically, you can combine some of the models described above, i.e., to allow for endogenous spillover and spatially-correlated error (the “general nesting specification” that includes both \(\rho\) and \(\lambda\)), or all spatial terms under the sun (the “Manski Model”). While the GNS was once a recommended practice, it is now discouraged because those models become inefficient and the effects become inseparable (Elhorst, 2010; LeSage, 2014a; LeSage & Pace, 2018; LeSage & Pace, 2009) (and the Manski model cannot be estimated). Instead, the recommended advice for applied work is to adopt either the spatial Durbin model, (SDM) or the spatial Durbin error model (SDEM), depending on whether endogenous spillovers are expected or not (LeSage & Pace, 2014).
28.4.2 Geographically-Weighted Regression (GWR)
This statement will draw lines in the sand among camps with competing views on spatial analysis–including my buddies (Fotheringham & Oshan, 2016; Oshan et al., 2020). But unless you are Dan McMillen and absolutely know what you are doing (McMillen, 2012; McMillen & McDonald, 1997; McMillen & Redfearn, 2010), my view is you probably should not be using Geographically-Weighted Regression (GWR). While there are clear negative consequences from the blind application of spatial econometric methods (Gibbons & Overman, 2012; McMillen, 2003, 2010), from an empirical perspective, the misapplication of GWR is by far more widespread. GWR is a very useful spatial analytical tool, but its inclusion into software packages like ArcGIS have given people the impression that it, too, is a panacea for all issues spatial. And while researchers may be misinterpreting the results from spatial econometric models, they are almost certainly misinterpreting the results from GWR.
GWR is an approach to spatial analysis that differs significantly from spatial econometric modeling. Whereas spatial econometrics uses a specific hypothesized structure of \(W\) to test for a formal DGP, the GWR approach emphasizes flexibility over statistical inference or generalizability1. It is a special case of non-parametric locally-weighted regression and, thus, has no underlying conceptual statistical model, and no appropriate way to interpret its coefficients–it is just an overfit model of a specific dataset, not a DGP. The method works by fitting many local regressions instead of a single global regression. That is, for each observation in the dataset, the approach is to select some nearby observations (say 10) and fit a unique regression for that point, then proceed to the next point until all points are exhausted.
This allows the coefficients to vary over space (i.e each observation), so it is a nice way of exploring spatial heterogeneity, and can also lead to some improved predictive power over OLS (because the coefficients are adaptive to local conditions). But because the data points are re-used repeatedly (and there is no underlying conceptual model) the coefficients from GWR are not interpretable like those from OLS, and do not represent the marginal change of X on Y (Wheeler & Calder, 2007; Wolf et al., 2017). GWR models are also unable to capture processes of spatial spillover when these effects are the substantive interest of study.
This has earned GWR many critiques in the spatial analysis literature, because it is commonly misinterpreted (Griffith, 2008; Páez et al., 2011; Wheeler & Calder, 2007). Indeed, Comber et al. (2022) (a group of GWR’s most prominent scholars) describe that, “as a rule, spatial effects via a [spatial econometric model] should be preferred [to GWR] due to its stronger inferential properties (e.g., see LeSage and Pace 2009). This is because inference in any GWR model is somewhat compromised by there being no-one single model, but a collection of models re-using sample data at multiple locations. This entails that a valid probability model is unavailable with GWR, making inference biased and problematic.” In other words, “GWR is more appropriately viewed as an exploratory approach and not a formal model to infer parameter nonstationarity. This view conflicts with the broad application of GWR as an inferential method” (Wheeler, 2014, p. 1443).
In the context of land-value modeling, GWR can be particularly problematic because it also suffers from issues of multicollinearity. Indeed, an early motivation for exploring the inferential properties of GWR is hedonic modeling for real-estate valuation (Bárcena et al., 2014). Here, prior work has shown that GWR results should be interpreted with extreme caution because multicollinearity is a natural expectation given that “houses close in space to house are usually similar in their typology, square footage, age, etc. Local design matrices are usually very poorly conditioned, which makes the effects of the regressors difficult or impossible to disentangle” (Bárcena et al., 2014, p. 443). Together, these issues make GWR a very unattractive method for trying to understand land value (Wheeler, 2007; Wheeler & Calder, 2007; Wheeler & Tiefelsdorf, 2005).
28.5 Spatial Models as Graphs
In the DAG and causal inference literature errors are typically left out (Pearl, 2014). In Bayesian statistics and especially in Structural Equation Modeling, errors are considered explicitly and formulated as part of the problem. In my view, causal inference people, Bayesians, and SEM folks all use path diagrams and graphic models similarly, but with their own distinct flavors (Clark, 2018). Spatial models are not well-suited to DAGs because we are often interested in cyclical models with feedback, and it is also useful to represent errors explicitly. In general, spatial econometric models are a kind of equilibrium model with simultaneity, where Huntington-Klein (2022) argues DAGs and do-calculus are probably less useful than traditional econometric techniques.
Standard causal diagrams must be acyclic, whereas equilibrium models contain feedback loops and so are cyclic. The difficulty that standard causal diagrams have with equilibrium models was a particular concern in the Imbens (2020) review of SCM.
– Huntington-Klein (2022), p.332
Without a doubt, Pearl (2023) would disagree :P. However the spatial models we need to consider and contrast are either semi-Markovian (in the spatial error, et al case) or non-Markovian (in the spatial lag et al case) (Pearl, 1998) because they contain cycles and feedback loops.
One fundamental property of Markovian models is parent screening: given the state of its parents \(pa\), each variable \(X\); is conditionally independent of all its nondescendants in the graph. This follows immediately from the independence of the errors \(\epsilon\); and supports the intuition that once the direct causes of \(X\); are known, the probability of \(X\); is completely determined; no other event preceding \(X\); could modify this probability. As a result, the statistical parameters of Markovian models can be estimated by ordinary regression analysis.
– Pearl (1998), p.237
In many spatial models this is not the case, so we need extra techniques like instrumental variables regression to estimate the model and the pure graphic representation gets complicated at best (Huntington-Klein, 2022). Thus, the graphic spatial models shown here are closer to the Structural Equation Model flavor than the SCM flavor (though I don’t bother including errors for every measure) (Bollen, 2014; Coman et al., 2015; SPIRTES et al., 1998).
The SEM (and Bayesian) perspective also helps reconcile this style with the causal style because the former group tends to think of a dependent variable \(y_i\) as a measurement rather than an actual outcome. Thus the resulting measure \(y_i\) is caused by a DGP that includes error by definition–nothing is measured perfectly–in which case the model needs to show how error causes the outcome2. Representing error explicitly and at the start is useful for considering whether it is entirely random in its contribution to \(y_i\), or whether some parts are structured (and therefore also need to be modeled explicitly). I think this is also codifying error using the ‘conceptual interpretation’ from Pearl (1998), which is useful precisely in our current case (even if errors are designed to work in the SCM framework via the ‘operational definition’)3 e.g.
“when it comes to deciding whether pairs of error terms can be assumed to be uncorrelated. Because such decisions are needed at a stage when the model’s parameters are still “free,” they cannot be made on the basis of numerical assessments of correlations but must rest instead on qualitative structural knowledge about how mechanisms are tied together and how variables affect each other.”
– Pearl (1998), p.274
Spatial models also depend critically on interactions which can be difficult in traditional DAG language so I include spatially-interactive variables (like spillover and propagation) as distinct nodes following Clark (2018) and Attia et al. (2022), though technically, this is breaking the rules (Huntington-Klein, 2022). As an additional difficulty, the spatial relationship graph \(W\) is itself a specific kind of interaction variable that is technically unobserved (albeit strongly inferred), then subsequently interacted with other variables to produce \(WX\), \(Wy\), and \(WU\). The fact that \(W\) is essentially posited means its worth considering as a path because it affects how elements propagate through the system (and we might/should posit alternatives).
Adpoting the definition of structural parameters from Pearl (1998), these are very different models:
OLS \[y_i = f(X_i, \epsilon)\]
Spatial Lag of \(X\) \[y_i = f(X_i, WX_j, \epsilon)\]
Spatial Error \[y_i = f(X_i, Wu_j, \epsilon)\]
Spatial Lag \[y_i = f(Wy_j, X_i, \epsilon)\]
SLX-Error \[y_i = f(X_i, WX_j, Wu_j, \epsilon)\]
Spatial Durbin \[y_i = f(Wy_j, X_i, WX_j, \epsilon)\]
Notably, LeSage & Pace (2009) LeSage & Pace (2014), and LeSage (2014a) reiterate this point continually.
Assuming spatial units \(i\) and \(j\), I think about the processes for these respective units as:
- independent if no node from process \(j\) affects \(y_i\)
- interdependent if an ancestor of \(y_j\) also affects \(y_i\)
- co-dependent if \(y_i\) and \(y_j\) share a node simultaneously
This assumes spatial homogeneity which means the \(i\) and \(j\) relationships are symmetric (i.e. the errors for \(i\) and \(j\) are identically distributed), in which case the micro-level processes for each unit \(i\) and \(j\) might look as follows:
28.5.1 Aspatial (OLS)
flowchart TD subgraph j Xj --> yj ej((ej)) -.-> yj end subgraph i ei((ei)) -.-> yi Xi --> yi end
The outcome \(y_i\) is determined by exogenous variables \(Xi\), and random error \(e_i\). \(y_i\) is independent of anything happening to \(j\) (and vice-versa). The edges are relationships that we estimate, so
\[X_i \rightarrow y_i = \beta,\]
but we actually estimate the error associated with \(y_i\), not the relationship between \(e_i\) and \(y_i\), so
\[e_i \rightarrow y_i = 1 \]
In an aspatial model, the process is the same for every unit. There is an identical model for \(i\) and (every other) \(j\) where edges never cross between \(i\) and \(j\) (so \(X_i \rightarrow Y_i = X_j \rightarrow Y_j = \beta\)); they function as independent co-processes. Our concern is only what influences \(y_i\), so there is no need to care about any \(j\). Generally, this would be represented as a single graph rather than two graphs with subscripts (because why duplicate if the models are identical for each unit?). But it is useful to represent the independent co-processes here because with spatial models we need to consider \(j\).
Given Tobler’s law (and many other theoretical considerations), there is often good reason to believe that \(i\) and \(j\) are not independent. So the baseline test is to see whether \(e_i \leftarrow W_{ij} \rightarrow e_j\) exists, e.g. using a Lagrange Multiplier test (Anselin, 1988b; Anselin et al., 1996), because proving that link provides evidence of spatial interaction (and invalidates an aspatial model). The tests can be performed separately for different links (lag vs error), or multiple links (lag and error), and have both classic and robust forms.
flowchart TD subgraph j Xj --> yj ej((ej)) -.-> yj end subgraph i ei((ei)) -.-> yi Xi --> yi end ej<-->Wij{Wij} ei<-->Wij{Wij}
If this link exists, then a different model is necessary to estimate relationships accurately. The trouble is, there are many reasonable conceptual models that could induce that link, and applied researchers are faced with the task of choosing ‘the best’ among several plausible alternatives. There are different strategies for this; some suggest starting with simple models, testing for the existence of the link, and incrementally building up to more complex models (specific to general), while others prefer to start with fully-specified models, then systematically remove complexity when the relationships are deemed unnecessary (general to specific). (Anselin et al., 2024; LeSage, 2014b; LeSage & Pace, 2014). In a recent simulation exercise, Anselin et al. (2024) suggest that there is no single ‘best’ strategy, but in general, forward-based searches (specific to general) are preferred.
28.5.2 SLX Model
flowchart TD W{"`$W_{ij}$`"} --> WXi W{"`$W_{ij}$`"} --> WXj Xj --> WXj Xi --> WXi subgraph j WXi --> yj ej(("`$\e_j$`")) -.-> yj Xj --> yj end subgraph i Xi --> yi ei(("`$\e_i$`")) -.-> yi WXj --> yi end
The SLX model includes exogenous of \(Xj\) through the spatial graph \(W_{ij}\). The process for \(i\) is no longer fully independent of the process for \(j\) as in the aspatial model above, but \(y_i\) depends on the exogenous characteristics of \(j\), \(WXj\). I might hazard to call these interdependent processes. Even though \(i\) and \(j\) interact, they do so only through the exogenous variables; they have no nodes in common (apart from \(W_{ij}\), which connects them by definition, but only in concept), so you can still draw a box around each process–you could pull \(WXj\) into the \(i\) box and treat it as part of the \(i\) process. Here \(WX_j \rightarrow y_i =\theta\) is local spatial spillover. Again, since the focus is really on the inputs to \(y_i\), displaying \(e_j\), \(WXi\), and \(y_j\) is not necessary but the full diagram is useful for thinking about the system and comparing the model to others.
28.5.3 Spatial Error Model
flowchart TD subgraph j Xj --> yj ej end WU((WU)) W{"`$W_{ij}$`"} --> WU subgraph i Xi --> yi ei end ei((ei)) <-.-> WU ej((ej)) <-.-> WU W WU -.-> yj WU WU -.-> yi
In the spatial error model (SEM, though not to be confused with structural equation model in this particular context), the error at location \(i\) is affected by the error at location \(j\). The infinite loop in path \(e_i \rightarrow WU \rightarrow e_j \rightarrow WU \dots\) is global error propagation. In this case, the process for \(i\) shares an endogenous element with the process for \(j\), which I would call co-dependent because \(i\) and \(j\) interact through a shared variable simultaneously4 and the nodes cannot be separated between the \(i\) and \(j\) processes. This is distinct from a multilevel model where \(i\) and \(j\) belong to different groups (as sometimes shown in the SCM framework) because \(e_i\) and \(e_j\) contain a simultaneous feedback (Wolf et al., 2021).
The path coefficients for the errors are fixed to one and we estimate the error \(U\). If this is the true model, then estimating it via OLS effectively ignores the path \(e_j \rightarrow WU \rightarrow e_i\), which throws off the estimate for \(U\). That is, using OLS misjudges the error going into \(y_i\), which makes the estimate of \(\beta\) inefficient, but not biased. There are no missing paths, but the size of the error is mis-estimated. Assuming spatial homogeneity (and that \(W_{ij}\) is known), then we can separate out
\[e_i \rightarrow WU = \epsilon_i\]
and
\[e_j \rightarrow WU = \lambda\]
and
\[WU \rightarrow y_i = 1\]
As with before, the nodes \(X_j\) and \(y_j\) (and their associated edges) are not technically necessary.
28.5.4 Spatial Lag Model
flowchart TD Wy((Wy))<-->yi subgraph j ej((ej)) -.-> yj Xj --> yj yj end subgraph i Xi --> yi ei((ei)) -.-> yi yi end W{"`$W_{ij}$`"}--> Wy Wy<-->yj
The spatial lag model includes endogenous interaction between \(y_i\) and \(y_j\) through \(W_{ij}\). This is another model with a node shared simultaneously between \(i\) and \(j\), this time representing what I might call co-dependence. The infinite loop \(y_i \rightarrow Wy \rightarrow y_j \rightarrow Wy \dots\) is global spatial spillover (Anselin, 2002). If this is the true model but it is estimated with OLS, you are effectively ignoring \(y_j \rightarrow Wy \rightarrow y_i\) which is an omitted path into \(y_i\). This induces the usual omitted variable bias, but also exacerbates any other OVB misspecification because of the feedback loop through \(Wy\) (LeSage & Pace, 2009). A critical implication of the SAR model is that every X variable has an indirect effect because of the feedback path \(X_i \rightarrow y_i \rightarrow Wy \rightarrow y_i\), sometimes called the spatial multiplier (Anselin, 2002, 2003; Steimetz, 2010) (hence the amplification of OVB).
- Direct Effect: \(X_i \rightarrow y_i\)
- Indirect Effect: \(X_i \rightarrow y_i \rightarrow Wy \rightarrow y_j \rightarrow Wy \rightarrow y_i\)
In the indirect effect, the \(X_i\) variable is transmitted back to \(y_i\) through \(y_j\). In this case, \(X_j\) and \(e_j\) can be omitted because they are not necessary in the model for \(y_i\) (assuming you can instrument for Wy). In the case of spatial heterogeneity, though, either or both terms would result in a different estimate for \(y_j\) which would propagate to \(y_i\) (!). Assuming \(i\) and \(j\) are symmetric, then \(X_i\) has the same influence on \(y_i\) that \(X_j\) has on \(y_j\), etc. Thus \(X_j\) and \(e_j\) can be removed from this graph.
28.5.5 SLX-Error Model
flowchart TD W{"`$W_{ij}$`"} --> WXi W{"`$W_{ij}$`"} --> WXj Xj --> WXj WU((WU)) subgraph j ej WXi --> yj Xj --> yj yj end subgraph i Xi --> yi WXj --> yi ei end Xi --> WXi ei((ei)) <-.-> WU ej((ej)) <-.-> WU W WU -.-> yj WU WU -.-> yi W-->WU
The SLX-Error model has the same endogenous error propagation as the SEM, and also the exogenous effects from the SLX model. There is one shared node in the processes, but only error. The \(WX_i\) and \(y_j\) nodes can be removed from this graph.
28.5.6 Spatial Durbin Model
flowchart TD Xj-->WXj Wy((Wy))<-->yi subgraph j Xj --> yj WXi-->yj ej((ej)) -.-> yj yj end W{"`$W_{ij}$`"}--> Wy subgraph i Xi --> yi WXj-->yi ei((ei)) -.-> yi yi end Wy<-->yj Xi-->WXi
The SDM model has the exogenous spillover from the SLX model and the endogenous spillover from the SAR model. As with the lag model the \(i\) and \(j\) processes share the \(Wy\) node. The \(WX_i\) and \(e_j\) nodes can be removed from this graph.
28.6 Spatial (Process) Heterogeneity
The adage with regression is that resulting errors must be i.i.d, independent and identically distributed to meet the model’s assumptions (i.e. to produce the intended result). In the graphs above, we explore different ways to ensure the independence criterion by breaking the link between \(e_i\) and \(e_j\). The second clause, however, requires errors to be identically distributed, which means the spatial processes (and \(\beta\) relationships) are the same for units \(i\) and \(j\). But as with treatment heterogeneity (Gelman et al., 2023), sometimes \(i\) and \(j\) processes are actually different (because they are different places after all).
In this case, the boxes for \(i\) and \(j\) are different because they represent different processes. Geographically-Weighted Regression takes this idea to the extreme: it assumes there is no co-dependence between \(i\) and \(j\), but instead there is an entirely different model for each \(i\) (estimated via a subsample of the data defined by \(W_{ij}\)). That means in GWR each unit has a different DGP, and there is no concept of spatial interaction. Every \(i\) box in the diagrams above is unique in such a case. Both of those properties are major drawbacks, in my opinion, because we are often interested in the average process (not unique processes for each unit) and in spatial interaction, but GWR does not explain either one.
The spatial econometric view of spatial heterogeneity is to test for error heteroskedasticity. If the errors are heterogeneous, then there are two options:
- use a more robust significance test on the estimated coefficients
- fit a model using spatial regimes
The former option does not require fitting a different model; in this approach, the errors are still viewed as fundamentally random, just improperly estimated, so the significance of the coefficients is evaluated more consevatively (Anselin, 1990; Arraiz et al., 2009; Kelejian & Prucha, 2010).
The latter option views the error heterogeneity as potentially systematic, and resulting from different spatial processes. Thus, the second approach breaks the study region into smaller sections and fits unique regression models in each region. This is a middle ground between a global model like SAR and a hyperlocal model like GWR. Regimes are typically specified exogenously, for example in a study using U.S. counties, we might expect slightly different spatial processes to occur in each state or different regions of the country, so regimes would be specified by state/region (Duque & Hierro, 2016; Elhorst & Fréret, 2009; Flores & Rodriguez-Oreggia, 2014; Myers et al., 2015). Alternatively, regimes can be viewed as endogenous, which is an emerging frontier in spatial econometric research (Anselin & Amaral, 2023)
There’s still an ongoing debate about this. Many folks are critical of spatial econometrics. McMillen says, essentially, that because you can never know the true DGP in an applied context, you should just abandon structural assumptions altogether and use GWR because it is flexible and explicit about lacking a-priori knowledge (McMillen, 2012). In many other cases, though, spatial econometric models are preceisely the tool for the job, because there are good theoretical grounds to expect spillovers exist. On this side of the argument, LeSage says, essentially, the best course is to adopt a Bayesian perspective and treat models as though they offer competing sources of evidence (LeSage, 2014b), in which case Bayesian fit metrics offer methods for choosing which model is most likely.↩︎
I think this is, like, the heart of Bayesianism, but I am not actually smart enough to make that assessment…↩︎
I am trained as a quantitative social scientist, but am neither a mathemetician, statistician, nor causality expert, so take this with a grain of salt. But in my judgment, the purpose of DAGs and do-calculus a-la the Pearl style is to express a stochastic process as a deterministic process. This is intentional because it divorces the statistical estimation from the philosophical DGP; the whole purpose is to determine whether you can identify \(X\) (a binary decision), regardless of how well you can quantify the precision of that estimate (a continuous decision). In econometric terms, this implies DAGs are concerned exclusively with consistency, and in this context, efficiency is a statistical concern, and therefore irrelevant at this point of the formulating the methods (so the argument goes, anyway). After you have your path diagram set, then you worry about the distribution of \(\epsilon\) (and how to estimate the graph’s edges).↩︎
Hence the proper name for this model, the simultaneous autoregressive error model↩︎