30  Hedonic Models in Space

Code
import requests

import contextily as ctx
import geopandas as gpd
import pandas as pd
import numpy as np
import esda
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

from formulaic import Formula
from geosnap import DataStore
from geosnap.io import get_acs, get_lodes, get_network_from_gdf
from libpysal.graph import Graph
from scipy.stats import zscore
from splot import esda as esdplt
from spreg import GMM_Error, GM_Lag, OLS, spsearch

%load_ext watermark
%watermark -a 'eli knaap' -iv
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Author: eli knaap

seaborn    : 0.13.2
scipy      : 1.16.3
formulaic  : 1.2.1
numpy      : 2.3.5
contextily : 1.6.2
geosnap    : 0.15.3
splot      : 1.1.7
requests   : 2.32.5
esda       : 2.8.0
statsmodels: 0.14.5
pandas     : 2.3.3
libpysal   : 4.13.0
matplotlib : 3.10.8
spreg      : 1.9.dev47+g6e5395399
geopandas  : 1.1.1

In the last chapter, several commonly-used model specifications were defined, but the choice among them depends on the research question. Naturally, if spillover effects are of substantive interest, then it is necessary to choose a model that includes (at least) \(\rho WY\) or \(\theta WX\), but beyond that, the choice is more complex. The literature basically offers two different strategies; Anselin & Rey (2014) argue that a “bottom-up” strategy is most effective, where we begin with simple models (e.g., OLS), then conduct spatial diagnostic tests to decide whether more complex specifications are necessary. LeSage (2014) and LeSage & Pace (2014) argue for a “top-down” strategy where we begin with a ‘fully specified’ conceptual model.

LeSage (2014, p. 19) goes so far as to say that “there are only two model specifications worth considering for applied work.” When there are theoretical reasons to consider global spillover processes (such as home prices in a hedonic model), then the SDM model is appropriate. In other cases that consider only local spillover effects, the SDEM specification will provide unbiased estimates of all parameters. Their logic (to always prefer the Durbin specifications) is simple: the inclusion of the \(\theta WX\) term ensures that if there is an exogenous spillover process, then it will be measured accurately with only a small efficiency penalty, but omitting the term would lead to bias. The distinction between the SDM versus the SDEM, however, is largely a theoretical choice for a given research question rather than an empirical matter. As LeSage & Pace (2014, p. 1551) describe:

“Global spillovers arise when changes in a characteristic of one region impact outcomes in more than just immediately neighboring regions. Global impacts fall on neighbors to the neighbors, neighbors to the neighbors to the neighbors, and so on… Local spillovers represent a situation where the impacts fall only on nearby or immediate neighbors, dying out before they impact regions that are neighbors to the neighbors… feedback effects arise in the case of global spillovers, but not in the case of local spillovers.”

It follows “that practitioners cannot simply estimate an SDM and SDEM model and draw meaningful conclusions about the correct model specification from point estimates of \(\rho\) or \(\lambda\)… the SDM specification implies global spatial spillovers, while the SDEM is consistent with local spillovers. These have very different policy implications in applied work” (LeSage, 2014, p. 22), because indirect (and total) effects will be underestimated in the case when SDM model is appropriate but the SDEM model is fit because SDEM does not include feedback effects.1

Different specifications imply different DGPs and reflect what you want to study.

The SAR and SDM models which contain the spatially-lagged dependent variable \(\rho Wy\) specify global spillovers; the SLX, SDM, and SDEM models which contain the \(\theta WX\) term specify local spillovers (SDM specifies both). The difference between global and local spillovers pertains to the ways that effects propagate through the system and the ways that model coefficients may be interpreted (Juhl, 2021); SAR assumes explicit spillover processes, whereas SEM assumes spatially correlated omitted variables. You should have theory guiding which specification you want to use, even though most applied work uses a fit-based model selection approach. As described in preceding sections, the difference between the different types of spillovers is a critical distinction for applied work and policy analysis because we generally need to consider marginal effects, which means computing direct and indirect effects for models including a spatial lag. In the examples below, we examine how to fit and compare diagnostics for these different models using the spreg package (Anselin & Rey, 2014; Rey et al., 2021; Rey & Anselin, 2010).

30.1 Data Preparation

Since we are collecting raw parcel data, we will need to do a bit of cleanup and preparation before beginning model construction. The state of Maryland makes available a great set of parcel-level data called PropertyView, which include point and polygon datasets including attributes from the tax assessor’s’ office (e.g. size and sales price of the property, etc.). They also provide sales listings broken down by quarter or aggregated by year, and this is the dataset we use for the model. We will also attach these parcel points to the attributes of the blockgroup they fall inside.

Code
np.set_printoptions(suppress=True)  # prevent scientific format
datasets = DataStore()
balt_acs = get_acs(datasets, county_fips="24510", years=2019)
balt_lodes = get_lodes(datasets, county_fips="24510", years=2019)

# compute access variables
balt_net = get_network_from_gdf(balt_acs)
balt_lodes["node_ids"] = balt_net.get_node_ids(
    balt_lodes.centroid.x, balt_lodes.centroid.y
)
balt_net.set(balt_lodes["node_ids"], balt_lodes["total_employees"].values, name="jobs")
access = balt_net.aggregate(2000, name="jobs", decay="exp")

# download PropertyView data and read as a geodataframe
headers = {"user-agent": "Wget/1.16 (linux-gnu)"}
zfilename = "BACIparcels0225.zip"
# download the zip file onto the local disk
z = requests.get(
    "https://www.dropbox.com/scl/fi/xub9kdq5eug6pcsiib0bf/BACIparcels0225.zip?rlkey=f5zwjp7uyl7xpzkmep9o92aec&st=9vc03ft5&dl=0",
    stream=True,
    headers=headers,
)
with open(f"./{zfilename}", "wb") as f:
    f.write(z.content)
gdf = gpd.read_file("./BACIparcels0225.zip", layer="BACIMDPV")
gdf = gdf.set_index("ACCTID")


zfilename = "Statewide-Res-Sales-CY2018.zip"
# get the 2018 Sales data to join to parcel points
z = requests.get(
    "https://www.dropbox.com/s/sstsedw9bhtlhyw/Statewide-Res-Sales-CY2018.zip",
    stream=True,
    headers=headers,
)
with open(f"./{zfilename}", "wb") as f:
    f.write(z.content)

sales = gpd.read_file("./Statewide-Res-Sales-CY2018.zip")
sales = sales.set_index("acctid")
# keep only sales from 2018, drop duplicate geom column
# CRS in this file is epsg:26985 (meters)
balt_sales = sales.join(gdf.drop(columns=["geometry"]), how="inner")

# attach job access data
balt_sales["node_ids"] = balt_net.get_node_ids(
    balt_sales.to_crs(4326).geometry.x, balt_sales.to_crs(4326).geometry.y
)
balt_sales = balt_sales.merge(
    access.rename("job_access"), left_on="node_ids", right_index=True
)

# keep only residential parcels
balt_sales = balt_sales[balt_sales.RESIDENT == 1]
balt_sales = balt_sales[balt_sales.DESCLU == "Residential"]
# remove apartment buildings
balt_sales = balt_sales[balt_sales.RESITYP != "AP"]
# drop observations with missing variables
balt_sales = balt_sales.dropna(subset=["STRUGRAD", "YEARBLT"])

# get a month column
balt_sales["date"] = balt_sales.tradate.astype("datetime64[ns]")
balt_sales["month"] = balt_sales.tradate.str[4:6]
# convert type
balt_sales["YEARBLT"] = balt_sales["YEARBLT"].astype(int)
balt_sales["struc_age"] = 2018 - balt_sales["YEARBLT"]
balt_sales["age_square"] = balt_sales["struc_age"] ** 2
balt_sales.DESCBLDG = balt_sales.DESCBLDG.astype("category")
balt_sales["STRUGRAD"] = balt_sales["STRUGRAD"].astype(float)

# keep only improved parcels less than $1.5m
balt_sales = balt_sales[balt_sales.considr1 > 35000]
balt_sales = balt_sales[balt_sales.considr1 < 1500000]
balt_sales = balt_sales[balt_sales.struc_age > 0]
balt_sales = balt_sales[(balt_sales["STRUGRAD"] < 9) & (balt_sales["STRUGRAD"] > 2)]
balt_sales = balt_sales[balt_sales["SQFTSTRC"] > 300]

balt_sales = balt_sales.merge(
    balt_acs.drop(columns=["geometry"]), left_on="bg2010", right_on="geoid"
)
balt_sales = balt_sales.reset_index(drop=True)
Generating contraction hierarchies with 16 threads.
Setting CH node vector of size 83534
Setting CH edge vector of size 254770
Range graph removed 258484 edges of 509540
. 10% . 20% . 30% . 40% . 50% . 60% . 70% . 80% . 90% . 100%
Removed 6336 rows because they contain missing values
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_88514/4245329051.py:9: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  balt_lodes.centroid.x, balt_lodes.centroid.y
Code
balt_sales.DESCBLDG.value_counts()
balt_sales.SQFTSTRC.describe()
count    5782.000000
mean     1488.406607
std       640.387661
min       320.000000
25%      1118.500000
50%      1300.000000
75%      1640.000000
max      6418.000000
Name: SQFTSTRC, dtype: float64
Code
balt_sales = balt_sales[~balt_sales.STRUBLDG.isin(['0005', '0004'])]
Code
balt_sales.RESITYP.value_counts().plot(kind='bar')

Mostly townhomes that sold in 2018. Not surprising, given Baltimore’s housing stock

Code
balt_sales.STRUGRAD.plot(kind='hist')

Code
f, ax = plt.subplots(1,2, figsize=(8,4))

balt_sales.considr1.plot(kind='density', ax=ax[0])
balt_sales.considr1.plot(kind='box', ax=ax[1])

Code
balt_sales.considr1.apply(np.log).plot(kind='density')

Log transform gets a much better shape

Code
balt_sales.considr1.apply(np.log).plot(kind='box')

Code
balt_sales['log_price'] = balt_sales.considr1.apply(np.log)
balt_sales.log_price.hist()

Code
pd.Series(zscore(balt_sales.considr1)).plot(kind='density')

Code
balt_sales["log_inc"] = balt_sales["median_household_income"].apply(np.log)
balt_sales["log_jobaccess"] = balt_sales["job_access"].apply(np.log1p)
balt_sales["STRUGRAD"] = balt_sales["STRUGRAD"].astype(float).apply(np.log)

COLS = [
    "DESCBLDG",
    "RESITYP",
    "DESCCNST",
    "SQFTSTRC",
    "LANDAREA",
    "struc_age",
    "age_square",
    "p_nonhisp_black_persons",
    "p_hispanic_persons",
    "p_edu_college_greater",
    "median_household_income",
    "p_housing_units_multiunit_structures",
    "p_owner_occupied_units",
    "job_access",
    "log_inc",
    "log_jobaccess",
]

balt_sales = balt_sales.dropna(subset=COLS)
balt_sales = balt_sales.to_crs(balt_sales.estimate_utm_crs())

30.2 ESDA

We generally start with ESDA (following the bottom-up strategy) to ensure we even care about spatial econometrics. We begin by mapping the dependent variable (housing price, here) and examining any global and local autocorrelation

Code
balt_sales[["considr1", "log_price", "geometry"]].explore(
    "log_price", scheme="fisher_jenks", k=10, cmap="plasma", tiles="CartoDB Positron"
)
Make this Notebook Trusted to load map: File -> Trust Notebook
Code
ax = balt_sales[["considr1", "log_price", "geometry"]].plot(
    "log_price", scheme="fisher_jenks", k=10, cmap="plasma", figsize=(6, 6)
)
ax.axis("off")
ctx.add_basemap(ax=ax, source=ctx.providers.CartoDB.Positron, crs=balt_sales.crs)

Log of Home Sales Prices in Baltimore City

Log of Home Sales Prices in Baltimore City

There are definitely spatial patterns, but they’re also not uniform. There’s probably autocorrelation in the sales prices, but it seems to be particularly strong in some pockets. We should be sure to check both global and local statistics. As such, we need a spatial Graph object to define the relationship between housing sales, and we have seen in prior chapters the importance of considering travel network structure. Here we will fit two different graphs and examine global and local autocorrelation using both.

Code
w_dist = Graph.build_distance_band(balt_sales, threshold=1000)
w_dist = w_dist.transform("r")

w_net = Graph.build_travel_cost(balt_sales.to_crs(4326), balt_net, threshold=1000)
w_net = w_net.transform("r")

30.2.1 Global Stats

To check global autocorrelation we follow the same basic procedure, creating Moran objects for both types of Graph (to which we refer as \(W\), following the spatial econometric terminology).

Code
moran = esda.Moran(balt_sales["log_price"], w_dist)
moran_net = esda.Moran(balt_sales["log_price"], w_net)

f, ax = plt.subplots(2, 2, figsize=(10, 10))
ax = ax.flatten()
moran.plot_scatter(ax=ax[0])
moran_net.plot_scatter(ax=ax[1])

ax[0].set_title(f"Euclidean Neighbors \nMoran's I: {round(moran.I, 3)}")
ax[1].set_title(f"Network Neighbors\nMoran's I: {round(moran_net.I, 3)}")

w_dist.plot(balt_sales, ax=ax[2])
ax[2].axis("off")
w_net.plot(balt_sales, ax=ax[3])
ax[3].axis("off")

plt.tight_layout

For reference, we also plot the spatial structure of each graph below its Moran scatterplot. The comparison shows (as expected) that the network-based neighborhood results in more localized neighbor-sets and higher levels of autocorrelation (.55 versus .52).

30.2.2 Local Stats

And following, we will do the same with LISA plots of the local Moran statistics.

Code
## | fig-cap: LISA of Home Sales Prices in Baltimore
lisa = esda.Moran_Local(
    balt_sales.log_price,
    w_net,
    n_jobs=-1,
)
ax = lisa.plot(balt_sales, alpha=0.7, figsize=(7, 7))
ax.axis("off")
ctx.add_basemap(ax=ax, source=ctx.providers.CartoDB.Positron, crs=balt_sales.crs)
plt.show()

The local Moran shows that our significant Moran statistic is driven by hotspots in “the Wedge” in the north-center of the city and the Inner Harbor, as well as coldspots in East and West Baltimore.

30.3 Models

Taking different modeling strategies, we can vary the way we operationalize \(p(h)\) to best identify the coefficients of interest, including the potential for spatial spillover (e.g. the price of one unit affecting the prices of those nearby). We specify the main parts of this model using Wilkinson formula, which is familiar if you have ever used R (Wilkinson & Rogers, 1973).

Here we will specify the log of sales price as a function of structural characteristics (size, age, dwelling type, construction type, and ‘structure grade’, as well as a squared age term to capture historic effects), demographic and socioeconomic attributes of the home’s census blockgroup, and its accessibility to jobs. A nice thing about Wilkinson formulas in formulaic is the ability to transform variables dynamically, thus here we set several variables as categorical (using the C() convention), and log-transform the square footage of the structure.

Code
formula = "log_price ~ 1 + STRUGRAD + C(RESITYP) + C(DESCCNST) + np.log(SQFTSTRC) + struc_age + age_square + p_nonhisp_white_persons + p_edu_college_greater + log_inc + p_housing_units_multiunit_structures + p_owner_occupied_units +  log_jobaccess"

30.3.1 Aspatial (OLS)

We will start by fitting the classic regression model which assumes observations are independent; in this case, that means the sales price of a home is dependent entirely on attributes of that home and is unaffected by homes nearby. We can start by fitting the model using the statsmodels package, which is the standard regression library in Python and provides a nice interface a builtin (classic) diagnostics.

\[y = \beta X + \epsilon\]

30.3.1.1 Using statsmodels

To proceed with statsmodels, we use the ols function from its ‘formula api’ (which is designed to accept Wilkinson formulas). The function returns an OLS class, upon which we call the fit method. Once the model is fit, the summary method returns a table of coefficients and model diagnostics.

Code
model_ols = smf.ols(formula, balt_sales).fit()
model_ols.summary()
OLS Regression Results
Dep. Variable: log_price R-squared: 0.709
Model: OLS Adj. R-squared: 0.708
Method: Least Squares F-statistic: 611.8
Date: Sun, 23 Nov 2025 Prob (F-statistic): 0.00
Time: 20:43:29 Log-Likelihood: -1767.8
No. Observations: 5543 AIC: 3582.
Df Residuals: 5520 BIC: 3734.
Df Model: 22
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 4.0414 0.237 17.059 0.000 3.577 4.506
C(RESITYP)[T.TH] -0.1508 0.014 -10.562 0.000 -0.179 -0.123
C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.0464 0.079 -0.585 0.559 -0.202 0.109
C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.4849 0.117 -4.132 0.000 -0.715 -0.255
C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.4743 0.242 1.956 0.050 -0.001 0.950
C(DESCCNST)[T.CNST Block] -0.0025 0.072 -0.034 0.973 -0.144 0.139
C(DESCCNST)[T.CNST Brick] -0.0111 0.051 -0.217 0.828 -0.112 0.089
C(DESCCNST)[T.CNST Frame] -0.0146 0.052 -0.282 0.778 -0.116 0.087
C(DESCCNST)[T.CNST Shingle Asbestos] 0.1092 0.057 1.902 0.057 -0.003 0.222
C(DESCCNST)[T.CNST Shingle Wood] 0.0145 0.061 0.237 0.813 -0.105 0.134
C(DESCCNST)[T.CNST Siding] 0.0641 0.052 1.241 0.215 -0.037 0.165
C(DESCCNST)[T.CNST Stone] 0.1649 0.075 2.186 0.029 0.017 0.313
C(DESCCNST)[T.CNST Stucco] -0.0095 0.057 -0.167 0.867 -0.122 0.102
STRUGRAD 0.6925 0.025 27.729 0.000 0.643 0.741
np.log(SQFTSTRC) 0.5247 0.016 33.731 0.000 0.494 0.555
struc_age -0.0041 0.001 -7.960 0.000 -0.005 -0.003
age_square 1.618e-05 3.06e-06 5.290 0.000 1.02e-05 2.22e-05
p_nonhisp_white_persons -0.0003 0.000 -1.145 0.252 -0.001 0.000
p_edu_college_greater 0.0055 0.000 13.222 0.000 0.005 0.006
log_inc 0.2278 0.018 12.863 0.000 0.193 0.263
p_housing_units_multiunit_structures 0.0022 0.000 8.759 0.000 0.002 0.003
p_owner_occupied_units 0.0070 0.001 13.751 0.000 0.006 0.008
log_jobaccess 0.0368 0.006 6.666 0.000 0.026 0.048
Omnibus: 585.976 Durbin-Watson: 1.362
Prob(Omnibus): 0.000 Jarque-Bera (JB): 802.382
Skew: -0.850 Prob(JB): 5.82e-175
Kurtosis: 3.762 Cond. No. 6.00e+05


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6e+05. This might indicate that there are
strong multicollinearity or other numerical problems.


For a first shot, this is a decent model, with an \(R^2\) value greater than 0.7. Most of the variables are significant, except for the construction materials, in which only the siding has a significant effect. Otherwise, the coefficients are as expected. Housing prices increase with larger and newer homes (though historic homes also get a bump). Townhomes are less expensive than single family, and there are some small effects from the neighborhood’s demographic composition, the one exception being median income, which is relatively large.

With linear regression models, among our primary concerns are that model residuals are independent and identically-distributed (I.I.D.). They do not necessarily need to be normal (though it’s ideal if they are), but they should be independent from one another (and ideally follow a known probability distribution). If the residuals are not identically-distributed, then robust errors can help insulate inferences, however the independence criterion is more difficult to handle. To check these criteria, it is common to plot the (standardized) predicted values against (standardized) residuals and visually assess randomness.

Code
ax = pd.DataFrame(
    {"predy": zscore(model_ols.fittedvalues), "$\epsilon$": zscore(model_ols.resid)}
).plot(x="predy", y="$\epsilon$", kind="scatter")
ax.hlines(0, -5, 5, "red")
ax.vlines(0, -5, 5, "red")
<>:2: SyntaxWarning: invalid escape sequence '\e'
<>:3: SyntaxWarning: invalid escape sequence '\e'
<>:2: SyntaxWarning: invalid escape sequence '\e'
<>:3: SyntaxWarning: invalid escape sequence '\e'
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_88514/2259918956.py:2: SyntaxWarning: invalid escape sequence '\e'
  {"predy": zscore(model_ols.fittedvalues), "$\epsilon$": zscore(model_ols.resid)}
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_88514/2259918956.py:3: SyntaxWarning: invalid escape sequence '\e'
  ).plot(x="predy", y="$\epsilon$", kind="scatter")

Code
ax = sns.kdeplot(y=zscore(model_ols.fittedvalues), x=zscore(model_ols.resid))
ax.hlines(0, -5, 5, 'red')
ax.vlines(0,-5, 5, 'red')

With this model the residuals are imperfect though not wholly unreasonable.

30.3.1.2 Using spreg

To generate spatial diagnostics for our model, we need to re-fit it using the spreg package from PySAL. However, since spreg does not understand Wilkinson formulas natively, we use the formulaic package, whose Formula class consumes a dataframe and the formula string, then returns the response vector and design matrix as spreg requires them. All other spatial specifications can be performed via options in the spreg estimation classes2.

Code
y, X = Formula(formula).get_model_matrix(pd.DataFrame(balt_sales))

Fitting this model using spreg we pass the y and X variables to the OLS class, but since we also want spatial diagnostics, we also include the Graph object (to the w argument), and include spat_diag=True which computes Lagrange Multiplier tests for spatial dependence in the residual3. Here we also compute the Moran test on the residuals, which requires an extra option given its additional computational overhead.

Code
model_ols_spreg = OLS(
    y=y,
    x=X,
    w=w_net,
    moran=True,
    spat_diag=True,
)

Once model is estimated using spreg, the coefficients and tests can be viewed

Code
print(model_ols_spreg.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   log_price                Number of Observations:        5543
Mean dependent var  :     12.1398                Number of Variables   :          23
S.D. dependent var  :      0.6173                Degrees of Freedom    :        5520
R-squared           :      0.7092
Adjusted R-squared  :      0.7080
Sum squared residual:     614.174                F-statistic           :    611.8131
Sigma-square        :       0.111                Prob(F-statistic)     :           0
S.E. of regression  :       0.334                Log likelihood        :   -1767.840
Sigma-square ML     :       0.111                Akaike info criterion :    3581.681
S.E of regression ML:      0.3329                Schwarz criterion     :    3733.948

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     t-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         4.04140         0.23690        17.05941         0.00000
            STRUGRAD         0.69245         0.02497        27.72919         0.00000
    C(RESITYP)[T.TH]        -0.15085         0.01428       -10.56169         0.00000
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.04637         0.07929        -0.58489         0.55865
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.48490         0.11735        -4.13202         0.00004
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.47431         0.24243         1.95646         0.05046
C(DESCCNST)[T.CNST Block]        -0.00245         0.07200        -0.03407         0.97282
C(DESCCNST)[T.CNST Brick]        -0.01114         0.05129        -0.21714         0.82810
C(DESCCNST)[T.CNST Frame]        -0.01463         0.05196        -0.28151         0.77833
C(DESCCNST)[T.CNST Shingle Asbestos]         0.10916         0.05738         1.90237         0.05717
C(DESCCNST)[T.CNST Shingle Wood]         0.01447         0.06109         0.23689         0.81275
C(DESCCNST)[T.CNST Siding]         0.06409         0.05164         1.24092         0.21469
C(DESCCNST)[T.CNST Stone]         0.16488         0.07543         2.18594         0.02886
C(DESCCNST)[T.CNST Stucco]        -0.00955         0.05711        -0.16721         0.86721
    np.log(SQFTSTRC)         0.52471         0.01556        33.73058         0.00000
           struc_age        -0.00414         0.00052        -7.96012         0.00000
          age_square         0.00002         0.00000         5.28951         0.00000
p_nonhisp_white_persons        -0.00028         0.00024        -1.14453         0.25245
p_edu_college_greater         0.00547         0.00041        13.22243         0.00000
             log_inc         0.22782         0.01771        12.86298         0.00000
p_housing_units_multiunit_structures         0.00225         0.00026         8.75894         0.00000
p_owner_occupied_units         0.00701         0.00051        13.75138         0.00000
       log_jobaccess         0.03681         0.00552         6.66597         0.00000
------------------------------------------------------------------------------------
Warning: Variable(s) ['Intercept'] removed for being constant.

REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER         225.365

TEST ON NORMALITY OF ERRORS
TEST                             DF        VALUE           PROB
Jarque-Bera                       2        802.382           0.0000

DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST                             DF        VALUE           PROB
Breusch-Pagan test               22        656.931           0.0000
Koenker-Bassett test             22        475.640           0.0000

DIAGNOSTICS FOR SPATIAL DEPENDENCE
- SARERR -
TEST                           MI/DF       VALUE           PROB
Moran's I (error)              0.1570       27.668           0.0000
Lagrange Multiplier (lag)         1        261.053           0.0000
Robust LM (lag)                   1         49.614           0.0000
Lagrange Multiplier (error)       1        745.488           0.0000
Robust LM (error)                 1        534.049           0.0000
Lagrange Multiplier (SARMA)       2        795.102           0.0000

- Spatial Durbin -
TEST                              DF       VALUE           PROB
LM test for WX                   22        519.285           0.0000
Robust LM WX test                22        886.997           0.0000
Lagrange Multiplier (lag)         1        261.053           0.0000
Robust LM Lag - SDM               1        628.765           0.0000
Joint test for SDM               23       1148.050           0.0000
================================ END OF REPORT =====================================

The estimated model coefficients are the same between packages, but using spreg, we get a full suite of spatial diagnostics in the resulting summary, all of which are highly significant. These results show overwhelming evidence our model should account for some kind of spatial dependence, but since all the tests are significant, we need more exploration to understand which model is most appropriate. So we will step-up our spatial models one at a time, following Anselin (2007). Further, as with the statsmodels result, the Jarque-Bera test of normality on the residuals is highly significant, which could indicate we need to use a robust estimator.

30.3.2 Spatial Lag-of-X (SLX)

To fit an SLX model using the same spatial weights matrix for the dependent variable and every independent variable, you can simply use spreg’s OLS class and pass a Graph, and the argument slx_lags=1, which indicates that the model should include a \(WX\) variable for each of the explanatory variables in the model formula (design matrix). This fits the model:

\[y = \beta X + \theta WX + \epsilon\]

Code
model_slx = OLS(
    y=y,
    x=X,
    w=w_net,
    moran=True,
    spat_diag=True,
    slx_lags=1,
)
model_slx.output
var_names coefficients std_err zt_stat prob
0 CONSTANT 4.642039 0.275026 16.878545 0.0
1 STRUGRAD 0.659429 0.02577 25.589335 0.0
2 C(RESITYP)[T.TH] -0.172735 0.016968 -10.179757 0.0
3 C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.038146 0.075784 -0.503358 0.614733
4 C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.480313 0.112715 -4.261296 0.000021
5 C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.344024 0.231716 1.484679 0.137686
6 C(DESCCNST)[T.CNST Block] -0.054803 0.068892 -0.795487 0.426365
7 C(DESCCNST)[T.CNST Brick] -0.025992 0.049092 -0.529445 0.596518
8 C(DESCCNST)[T.CNST Frame] -0.030933 0.049719 -0.622168 0.533857
9 C(DESCCNST)[T.CNST Shingle Asbestos] 0.044798 0.055098 0.813059 0.416219
10 C(DESCCNST)[T.CNST Shingle Wood] -0.024704 0.058962 -0.418976 0.67525
11 C(DESCCNST)[T.CNST Siding] 0.020576 0.049593 0.414903 0.678229
12 C(DESCCNST)[T.CNST Stone] 0.156798 0.072992 2.148148 0.031746
13 C(DESCCNST)[T.CNST Stucco] -0.041288 0.054676 -0.755137 0.450199
14 np.log(SQFTSTRC) 0.542232 0.015892 34.120354 0.0
15 struc_age -0.004198 0.000539 -7.790037 0.0
16 age_square 0.000015 0.000003 4.68384 0.000003
17 p_nonhisp_white_persons 0.001516 0.00039 3.88954 0.000102
18 p_edu_college_greater 0.002195 0.000474 4.635165 0.000004
19 log_inc 0.14734 0.018002 8.1848 0.0
20 p_housing_units_multiunit_structures 0.001573 0.00029 5.419061 0.0
21 p_owner_occupied_units 0.003835 0.00057 6.728947 0.0
22 log_jobaccess 0.056401 0.017197 3.279735 0.001046
23 W_STRUGRAD 0.026409 0.051774 0.510088 0.610011
24 W_C(RESITYP)[T.TH] 0.075844 0.029023 2.613235 0.008993
25 W_C(DESCCNST)[T.CNST 1/2 Brick Siding] -1.057799 0.231905 -4.561356 0.000005
26 W_C(DESCCNST)[T.CNST 1/2 Stone Frame] -1.453311 0.289336 -5.022921 0.000001
27 W_C(DESCCNST)[T.CNST 1/2 Stone Siding] 1.295219 1.027648 1.260373 0.207588
28 W_C(DESCCNST)[T.CNST Block] -0.975344 0.206757 -4.717346 0.000002
29 W_C(DESCCNST)[T.CNST Brick] -0.972882 0.155039 -6.275086 0.0
30 W_C(DESCCNST)[T.CNST Frame] -1.078228 0.156205 -6.902645 0.0
31 W_C(DESCCNST)[T.CNST Shingle Asbestos] -0.768816 0.171147 -4.492126 0.000007
32 W_C(DESCCNST)[T.CNST Shingle Wood] -1.173693 0.174873 -6.711685 0.0
33 W_C(DESCCNST)[T.CNST Siding] -0.775578 0.152935 -5.071279 0.0
34 W_C(DESCCNST)[T.CNST Stone] -1.066295 0.235908 -4.519956 0.000006
35 W_C(DESCCNST)[T.CNST Stucco] -0.900155 0.171709 -5.242327 0.0
36 W_np.log(SQFTSTRC) -0.137501 0.031089 -4.422854 0.00001
37 W_struc_age 0.003047 0.001101 2.767636 0.005665
38 W_age_square -0.000013 0.000007 -1.966283 0.049316
39 W_p_nonhisp_white_persons -0.004045 0.000492 -8.224894 0.0
40 W_p_edu_college_greater 0.006105 0.000697 8.764161 0.0
41 W_log_inc 0.131123 0.025597 5.122574 0.0
42 W_p_housing_units_multiunit_structures 0.00181 0.000523 3.461996 0.00054
43 W_p_owner_occupied_units 0.008986 0.001001 8.97579 0.0
44 W_log_jobaccess -0.025126 0.019055 -1.3186 0.187358

In addition to the summary attribute, which stores a large textual report, the output attribute of each spreg class stores the estimated coefficients and significance measures as a pandas DataFrame. In this output, we can see each of the \(X\) variables in the original specification now also has a \(WX\) version (prefixed by W_), many of which are significant. These results show clear and substantive evidence of spatial spillover (especially for neighborhood income W_loginc, which has a very large coefficient), where attributes of observations nearby influence the outcome at a different location.

Code
print(f"Robust LM Error p-value: {model_slx.rlm_error[1]}")
print(f"Robust LM Lag p-value: {model_slx.rlm_lag[1]}")
Robust LM Error p-value: 0.0025532603356221394
Robust LM Lag p-value: 0.00013605719519529572

Despite significant \(WX\) results, the robust Lagrange Multiplier tests for both the SEM and SAR models are highly significant, showing significant autocorrelation remaining in the residual and suggesting we still need to explore more models.

30.3.3 Spatial Error (SEM)

To fit the spatial error model it is best to use the GMM_Error class, which fits the model:

\[y = \beta X + u\] \[ u = \lambda Wu+ \epsilon \]

Note

Classic estimation for spatial econometric models uses Maximum-Likelihood (ML), but modern applications typically prefer Generalized Method of Moments (GMM) for spatial error variants and Two-Stage Least Squares (TSLS) estimation for lag-based models, both of which are faster and applicable to larger datasets. Below, we will only use the spreg classes based on these estimators, however this is important to know, because TSLS estimation uses higher-order spatial lags of the X variables as instruments for estimating \(\rho WY\) by default. This will cause problems when estimating models that also include lagged \(X\)-variables (i.e. SLX-Error and Spatial Durbin). To avoid these errors, you can provide alternative instruments yourself, or you can pass w_lags >=2 which uses even higher-order neighbors as instruments.

Following the same conventions as above, we pass y, X, and W, then examine the output. Since the OLS model also showed some evidence of heteroskedasticity in the errors, we will also use a robust estimator by passing estimator="het".

Code
model_sem = GMM_Error(
    y=y,
    x=X,
    w=w_net,
    estimator="het",
)
model_sem.output.round(4)
var_names coefficients std_err zt_stat prob
0 CONSTANT 5.184821 0.263781 19.655813 0.0
1 STRUGRAD 0.690482 0.027561 25.052584 0.0
2 C(RESITYP)[T.TH] -0.177419 0.016616 -10.677869 0.0
3 C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.002248 0.065781 -0.034172 0.97274
4 C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.431828 0.143771 -3.003589 0.002668
5 C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.365504 0.118919 3.073542 0.002115
6 C(DESCCNST)[T.CNST Block] 0.015147 0.06651 0.227736 0.819851
7 C(DESCCNST)[T.CNST Brick] 0.026149 0.045781 0.571184 0.567875
8 C(DESCCNST)[T.CNST Frame] 0.029853 0.046679 0.639547 0.522467
9 C(DESCCNST)[T.CNST Shingle Asbestos] 0.095888 0.049408 1.940736 0.05229
10 C(DESCCNST)[T.CNST Shingle Wood] 0.057046 0.052154 1.093809 0.274039
11 C(DESCCNST)[T.CNST Siding] 0.067587 0.046474 1.454309 0.145861
12 C(DESCCNST)[T.CNST Stone] 0.213639 0.056558 3.777374 0.000158
13 C(DESCCNST)[T.CNST Stucco] 0.013774 0.048523 0.28386 0.776518
14 np.log(SQFTSTRC) 0.533307 0.016729 31.879593 0.0
15 struc_age -0.004462 0.000428 -10.421149 0.0
16 age_square 0.000017 0.000003 6.71005 0.0
17 p_nonhisp_white_persons 0.001332 0.000348 3.827689 0.000129
18 p_edu_college_greater 0.004356 0.000491 8.868598 0.0
19 log_inc 0.13974 0.019919 7.015441 0.0
20 p_housing_units_multiunit_structures 0.001637 0.000275 5.952695 0.0
21 p_owner_occupied_units 0.004151 0.000612 6.778271 0.0
22 log_jobaccess 0.039208 0.009771 4.012613 0.00006
23 lambda 0.671989 0.017309 38.823503 0.0
Code
print(model_sem.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES (HET)
------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   log_price                Number of Observations:        5543
Mean dependent var  :     12.1398                Number of Variables   :          23
S.D. dependent var  :      0.6173                Degrees of Freedom    :        5520
Pseudo R-squared    :      0.7020
N. of iterations    :           1                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         5.18482         0.26378        19.65581         0.00000
            STRUGRAD         0.69048         0.02756        25.05258         0.00000
    C(RESITYP)[T.TH]        -0.17742         0.01662       -10.67787         0.00000
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.00225         0.06578        -0.03417         0.97274
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.43183         0.14377        -3.00359         0.00267
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.36550         0.11892         3.07354         0.00212
C(DESCCNST)[T.CNST Block]         0.01515         0.06651         0.22774         0.81985
C(DESCCNST)[T.CNST Brick]         0.02615         0.04578         0.57118         0.56788
C(DESCCNST)[T.CNST Frame]         0.02985         0.04668         0.63955         0.52247
C(DESCCNST)[T.CNST Shingle Asbestos]         0.09589         0.04941         1.94074         0.05229
C(DESCCNST)[T.CNST Shingle Wood]         0.05705         0.05215         1.09381         0.27404
C(DESCCNST)[T.CNST Siding]         0.06759         0.04647         1.45431         0.14586
C(DESCCNST)[T.CNST Stone]         0.21364         0.05656         3.77737         0.00016
C(DESCCNST)[T.CNST Stucco]         0.01377         0.04852         0.28386         0.77652
    np.log(SQFTSTRC)         0.53331         0.01673        31.87959         0.00000
           struc_age        -0.00446         0.00043       -10.42115         0.00000
          age_square         0.00002         0.00000         6.71005         0.00000
p_nonhisp_white_persons         0.00133         0.00035         3.82769         0.00013
p_edu_college_greater         0.00436         0.00049         8.86860         0.00000
             log_inc         0.13974         0.01992         7.01544         0.00000
p_housing_units_multiunit_structures         0.00164         0.00028         5.95270         0.00000
p_owner_occupied_units         0.00415         0.00061         6.77827         0.00000
       log_jobaccess         0.03921         0.00977         4.01261         0.00006
              lambda         0.67199         0.01731        38.82350         0.00000
------------------------------------------------------------------------------------
Warning: Variable(s) ['Intercept'] removed for being constant.
================================ END OF REPORT =====================================

In these results, we now have a lambda coefficient for our spatial error term, which is highly significant (and pretty large at 0.66). Most \(X\) variables related to the building structure are still significant, and now the neighborhood racial composition becomes significant. The rest of the significant coefficients have changed values a bit, for example the log_inc coefficient is reduced to 0.142 (down from 0.228 in the OLS model).

30.3.4 Spatial Lag (SAR)

To fit a spatial lag model the preferred approach is to use the GM_Lag class, which uses TSLS under the hood. If you have additional instruments to use, you can pass them to yend, but here we will adopt the default (which computes then uses \(WX\) as instruments). For robustness, we will also include White standard errors.

\[y = \rho W y + \beta X + \epsilon\]

Code
model_sar = GM_Lag(
    y=y,
    x=X,
    w=w_net,
    robust="white",
)
model_sar.output
var_names coefficients std_err zt_stat prob
0 CONSTANT 3.212091 0.368721 8.711449 0.0
1 STRUGRAD 0.663002 0.028349 23.386941 0.0
2 C(RESITYP)[T.TH] -0.145262 0.015339 -9.470194 0.0
3 C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.034747 0.071632 -0.485083 0.627617
4 C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.482342 0.147341 -3.273634 0.001062
5 C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.439415 0.117683 3.733873 0.000189
6 C(DESCCNST)[T.CNST Block] -0.003193 0.067718 -0.04715 0.962394
7 C(DESCCNST)[T.CNST Brick] -0.004066 0.04775 -0.085143 0.932148
8 C(DESCCNST)[T.CNST Frame] -0.006989 0.048752 -0.143356 0.886009
9 C(DESCCNST)[T.CNST Shingle Asbestos] 0.107672 0.051356 2.096603 0.036029
10 C(DESCCNST)[T.CNST Shingle Wood] 0.007915 0.054709 0.144679 0.884965
11 C(DESCCNST)[T.CNST Siding] 0.063223 0.047952 1.318455 0.187351
12 C(DESCCNST)[T.CNST Stone] 0.154078 0.056779 2.713627 0.006655
13 C(DESCCNST)[T.CNST Stucco] -0.009916 0.051623 -0.192093 0.847669
14 np.log(SQFTSTRC) 0.509048 0.016956 30.021413 0.0
15 struc_age -0.004213 0.000403 -10.465664 0.0
16 age_square 0.000016 0.000002 6.81864 0.0
17 p_nonhisp_white_persons -0.000182 0.000254 -0.714607 0.474852
18 p_edu_college_greater 0.004653 0.000487 9.558622 0.0
19 log_inc 0.201296 0.021811 9.229147 0.0
20 p_housing_units_multiunit_structures 0.001887 0.000288 6.559435 0.0
21 p_owner_occupied_units 0.006503 0.00057 11.416549 0.0
22 log_jobaccess 0.031762 0.005751 5.522875 0.0
23 W_log_price 0.115276 0.038207 3.017164 0.002552

Compared to the SEM, this model has a slightly better model fit, and some of the coefficients have changed a bit. The spatial autoregressive parameter called W_dep_var in the report is relatively small but highly significant. Racial composition in the neighborhood is no longer significant, and the neighborhood income coefficient has increased again up to 0.20.

Code
print(model_sar.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   log_price                Number of Observations:        5543
Mean dependent var  :     12.1398                Number of Variables   :          24
S.D. dependent var  :      0.6173                Degrees of Freedom    :        5519
Pseudo R-squared    :      0.7209
Spatial Pseudo R-squared:  0.7113

White Standard Errors
------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         3.21209         0.36872         8.71145         0.00000
            STRUGRAD         0.66300         0.02835        23.38694         0.00000
    C(RESITYP)[T.TH]        -0.14526         0.01534        -9.47019         0.00000
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.03475         0.07163        -0.48508         0.62762
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.48234         0.14734        -3.27363         0.00106
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.43941         0.11768         3.73387         0.00019
C(DESCCNST)[T.CNST Block]        -0.00319         0.06772        -0.04715         0.96239
C(DESCCNST)[T.CNST Brick]        -0.00407         0.04775        -0.08514         0.93215
C(DESCCNST)[T.CNST Frame]        -0.00699         0.04875        -0.14336         0.88601
C(DESCCNST)[T.CNST Shingle Asbestos]         0.10767         0.05136         2.09660         0.03603
C(DESCCNST)[T.CNST Shingle Wood]         0.00792         0.05471         0.14468         0.88496
C(DESCCNST)[T.CNST Siding]         0.06322         0.04795         1.31846         0.18735
C(DESCCNST)[T.CNST Stone]         0.15408         0.05678         2.71363         0.00666
C(DESCCNST)[T.CNST Stucco]        -0.00992         0.05162        -0.19209         0.84767
    np.log(SQFTSTRC)         0.50905         0.01696        30.02141         0.00000
           struc_age        -0.00421         0.00040       -10.46566         0.00000
          age_square         0.00002         0.00000         6.81864         0.00000
p_nonhisp_white_persons        -0.00018         0.00025        -0.71461         0.47485
p_edu_college_greater         0.00465         0.00049         9.55862         0.00000
             log_inc         0.20130         0.02181         9.22915         0.00000
p_housing_units_multiunit_structures         0.00189         0.00029         6.55943         0.00000
p_owner_occupied_units         0.00650         0.00057        11.41655         0.00000
       log_jobaccess         0.03176         0.00575         5.52287         0.00000
         W_log_price         0.11528         0.03821         3.01716         0.00255
------------------------------------------------------------------------------------
Instrumented: W_log_price
Instruments: W_C(DESCCNST)[T.CNST 1/2 Brick Siding], W_C(DESCCNST)[T.CNST
             1/2 Stone Frame], W_C(DESCCNST)[T.CNST 1/2 Stone Siding],
             W_C(DESCCNST)[T.CNST Block], W_C(DESCCNST)[T.CNST Brick],
             W_C(DESCCNST)[T.CNST Frame], W_C(DESCCNST)[T.CNST Shingle
             Asbestos], W_C(DESCCNST)[T.CNST Shingle Wood],
             W_C(DESCCNST)[T.CNST Siding], W_C(DESCCNST)[T.CNST Stone],
             W_C(DESCCNST)[T.CNST Stucco], W_C(RESITYP)[T.TH], W_STRUGRAD,
             W_age_square, W_log_inc, W_log_jobaccess, W_np.log(SQFTSTRC),
             W_p_edu_college_greater,
             W_p_housing_units_multiunit_structures,
             W_p_nonhisp_white_persons, W_p_owner_occupied_units,
             W_struc_age
Warning: Variable(s) ['Intercept'] removed for being constant.

DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST                              DF         VALUE           PROB
Anselin-Kelejian Test             1        203.607           0.0000

SPATIAL LAG MODEL IMPACTS
Impacts computed using the 'simple' method.
            Variable         Direct        Indirect          Total
            STRUGRAD         0.6630          0.0864          0.7494
    C(RESITYP)[T.TH]        -0.1453         -0.0189         -0.1642
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.0347         -0.0045         -0.0393
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.4823         -0.0628         -0.5452
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.4394          0.0573          0.4967
C(DESCCNST)[T.CNST Block]        -0.0032         -0.0004         -0.0036
C(DESCCNST)[T.CNST Brick]        -0.0041         -0.0005         -0.0046
C(DESCCNST)[T.CNST Frame]        -0.0070         -0.0009         -0.0079
C(DESCCNST)[T.CNST Shingle Asbestos]         0.1077          0.0140          0.1217
C(DESCCNST)[T.CNST Shingle Wood]         0.0079          0.0010          0.0089
C(DESCCNST)[T.CNST Siding]         0.0632          0.0082          0.0715
C(DESCCNST)[T.CNST Stone]         0.1541          0.0201          0.1742
C(DESCCNST)[T.CNST Stucco]        -0.0099         -0.0013         -0.0112
    np.log(SQFTSTRC)         0.5090          0.0663          0.5754
           struc_age        -0.0042         -0.0005         -0.0048
          age_square         0.0000          0.0000          0.0000
p_nonhisp_white_persons        -0.0002         -0.0000         -0.0002
p_edu_college_greater         0.0047          0.0006          0.0053
             log_inc         0.2013          0.0262          0.2275
p_housing_units_multiunit_structures         0.0019          0.0002          0.0021
p_owner_occupied_units         0.0065          0.0008          0.0074
       log_jobaccess         0.0318          0.0041          0.0359
================================ END OF REPORT =====================================

The summary for models that include lagged dependent variables is also slightly different, as the end of the summary table now includes a set of impacts for each explanatory variable that show the direct, indirect, and total effects associated with each input. Thus we can see the total impact of neighborhood income is actually even higher at 0.23 after accounting for spillover effects. The Anselin-Kelejian test for spatial dependence is still significant in this model (Anselin & Kelejian, 1997), which suggests we may need to use an even more general model like the Spatial Durbin.

30.3.5 SLX-Error (SLXE / Durbin-Error)

To fit an SLXE model, we use the GMM_Error class once again, but this time pass slx_lags=1, which automatically handles the creation and inclusion of our \(WX\) variables.

Code
model_slxe = GMM_Error(
    y=y,
    x=X,
    w=w_net,
    slx_lags=1,
    estimator="het",
)
model_slxe.output
var_names coefficients std_err zt_stat prob
0 CONSTANT 5.237416 0.306487 17.088525 0.0
1 STRUGRAD 0.660464 0.027692 23.85068 0.0
2 C(RESITYP)[T.TH] -0.184457 0.016705 -11.041936 0.0
3 C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.036857 0.073304 -0.502798 0.615106
4 C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.481501 0.157975 -3.047961 0.002304
5 C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.320318 0.162945 1.965798 0.049322
6 C(DESCCNST)[T.CNST Block] -0.040008 0.069544 -0.575283 0.5651
7 C(DESCCNST)[T.CNST Brick] -0.017152 0.049675 -0.345276 0.729887
8 C(DESCCNST)[T.CNST Frame] -0.021026 0.050634 -0.415265 0.677948
9 C(DESCCNST)[T.CNST Shingle Asbestos] 0.049323 0.053223 0.926727 0.354068
10 C(DESCCNST)[T.CNST Shingle Wood] -0.007045 0.055501 -0.126931 0.898995
11 C(DESCCNST)[T.CNST Siding] 0.029617 0.050008 0.592244 0.553687
12 C(DESCCNST)[T.CNST Stone] 0.162651 0.058635 2.773951 0.005538
13 C(DESCCNST)[T.CNST Stucco] -0.029299 0.053153 -0.551232 0.581474
14 np.log(SQFTSTRC) 0.530204 0.016885 31.401142 0.0
15 struc_age -0.004403 0.000422 -10.444259 0.0
16 age_square 0.000016 0.000002 6.416309 0.0
17 p_nonhisp_white_persons 0.001249 0.000377 3.315494 0.000915
18 p_edu_college_greater 0.002927 0.000489 5.99057 0.0
19 log_inc 0.122706 0.019353 6.340508 0.0
20 p_housing_units_multiunit_structures 0.001479 0.00027 5.469039 0.0
21 p_owner_occupied_units 0.003561 0.000594 5.991676 0.0
22 log_jobaccess 0.045178 0.017078 2.64537 0.00816
23 W_STRUGRAD 0.021399 0.062003 0.34512 0.730004
24 W_C(RESITYP)[T.TH] 0.037688 0.037366 1.008612 0.313161
25 W_C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.696259 0.292497 -2.380397 0.017294
26 W_C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.756676 0.551073 -1.373095 0.169723
27 W_C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.542445 0.805488 0.673436 0.50067
28 W_C(DESCCNST)[T.CNST Block] -0.562274 0.309494 -1.816753 0.069255
29 W_C(DESCCNST)[T.CNST Brick] -0.574081 0.24838 -2.311303 0.020816
30 W_C(DESCCNST)[T.CNST Frame] -0.656507 0.25014 -2.624556 0.008676
31 W_C(DESCCNST)[T.CNST Shingle Asbestos] -0.545031 0.261415 -2.084925 0.037076
32 W_C(DESCCNST)[T.CNST Shingle Wood] -0.732053 0.274849 -2.663471 0.007734
33 W_C(DESCCNST)[T.CNST Siding] -0.460589 0.247025 -1.864544 0.062245
34 W_C(DESCCNST)[T.CNST Stone] -0.655921 0.327101 -2.005256 0.044936
35 W_C(DESCCNST)[T.CNST Stucco] -0.514824 0.260043 -1.979769 0.04773
36 W_np.log(SQFTSTRC) -0.133987 0.04091 -3.27513 0.001056
37 W_struc_age 0.001005 0.001218 0.825242 0.409234
38 W_age_square -0.000003 0.000007 -0.416008 0.677404
39 W_p_nonhisp_white_persons -0.00334 0.000608 -5.492505 0.0
40 W_p_edu_college_greater 0.006143 0.000906 6.780808 0.0
41 W_log_inc 0.09634 0.033928 2.839545 0.004518
42 W_p_housing_units_multiunit_structures 0.001204 0.000661 1.820163 0.068734
43 W_p_owner_occupied_units 0.008228 0.001516 5.425666 0.0
44 W_log_jobaccess -0.022268 0.021013 -1.059726 0.289269
45 lambda 0.508535 0.022248 22.857148 0.0
Code
print(model_slxe.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES (HET) WITH SLX (SLX-Error)
---------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   log_price                Number of Observations:        5543
Mean dependent var  :     12.1398                Number of Variables   :          45
S.D. dependent var  :      0.6173                Degrees of Freedom    :        5498
Pseudo R-squared    :      0.7345
N. of iterations    :           1                Step1c computed       :          No

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         5.23742         0.30649        17.08852         0.00000
            STRUGRAD         0.66046         0.02769        23.85068         0.00000
    C(RESITYP)[T.TH]        -0.18446         0.01671       -11.04194         0.00000
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.03686         0.07330        -0.50280         0.61511
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.48150         0.15797        -3.04796         0.00230
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.32032         0.16295         1.96580         0.04932
C(DESCCNST)[T.CNST Block]        -0.04001         0.06954        -0.57528         0.56510
C(DESCCNST)[T.CNST Brick]        -0.01715         0.04967        -0.34528         0.72989
C(DESCCNST)[T.CNST Frame]        -0.02103         0.05063        -0.41526         0.67795
C(DESCCNST)[T.CNST Shingle Asbestos]         0.04932         0.05322         0.92673         0.35407
C(DESCCNST)[T.CNST Shingle Wood]        -0.00704         0.05550        -0.12693         0.89900
C(DESCCNST)[T.CNST Siding]         0.02962         0.05001         0.59224         0.55369
C(DESCCNST)[T.CNST Stone]         0.16265         0.05863         2.77395         0.00554
C(DESCCNST)[T.CNST Stucco]        -0.02930         0.05315        -0.55123         0.58147
    np.log(SQFTSTRC)         0.53020         0.01688        31.40114         0.00000
           struc_age        -0.00440         0.00042       -10.44426         0.00000
          age_square         0.00002         0.00000         6.41631         0.00000
p_nonhisp_white_persons         0.00125         0.00038         3.31549         0.00091
p_edu_college_greater         0.00293         0.00049         5.99057         0.00000
             log_inc         0.12271         0.01935         6.34051         0.00000
p_housing_units_multiunit_structures         0.00148         0.00027         5.46904         0.00000
p_owner_occupied_units         0.00356         0.00059         5.99168         0.00000
       log_jobaccess         0.04518         0.01708         2.64537         0.00816
          W_STRUGRAD         0.02140         0.06200         0.34512         0.73000
  W_C(RESITYP)[T.TH]         0.03769         0.03737         1.00861         0.31316
W_C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.69626         0.29250        -2.38040         0.01729
W_C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.75668         0.55107        -1.37309         0.16972
W_C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.54244         0.80549         0.67344         0.50067
W_C(DESCCNST)[T.CNST Block]        -0.56227         0.30949        -1.81675         0.06925
W_C(DESCCNST)[T.CNST Brick]        -0.57408         0.24838        -2.31130         0.02082
W_C(DESCCNST)[T.CNST Frame]        -0.65651         0.25014        -2.62456         0.00868
W_C(DESCCNST)[T.CNST Shingle Asbestos]        -0.54503         0.26142        -2.08492         0.03708
W_C(DESCCNST)[T.CNST Shingle Wood]        -0.73205         0.27485        -2.66347         0.00773
W_C(DESCCNST)[T.CNST Siding]        -0.46059         0.24703        -1.86454         0.06225
W_C(DESCCNST)[T.CNST Stone]        -0.65592         0.32710        -2.00526         0.04494
W_C(DESCCNST)[T.CNST Stucco]        -0.51482         0.26004        -1.97977         0.04773
  W_np.log(SQFTSTRC)        -0.13399         0.04091        -3.27513         0.00106
         W_struc_age         0.00100         0.00122         0.82524         0.40923
        W_age_square        -0.00000         0.00001        -0.41601         0.67740
W_p_nonhisp_white_persons        -0.00334         0.00061        -5.49250         0.00000
W_p_edu_college_greater         0.00614         0.00091         6.78081         0.00000
           W_log_inc         0.09634         0.03393         2.83954         0.00452
W_p_housing_units_multiunit_structures         0.00120         0.00066         1.82016         0.06873
W_p_owner_occupied_units         0.00823         0.00152         5.42567         0.00000
     W_log_jobaccess        -0.02227         0.02101        -1.05973         0.28927
              lambda         0.50853         0.02225        22.85715         0.00000
------------------------------------------------------------------------------------
Warning: Variable(s) ['Intercept'] removed for being constant.
================================ END OF REPORT =====================================

With this result, we have both a lambda coefficient (which remains significant and large, though a bit smaller) and a series of W_ coefficients as with the SLX specification.

30.3.6 Spatial Durbin (SDM)

Code
model_durbin = GM_Lag(
    y=y,
    x=X,
    w=w_net,
    slx_lags=1,
)
model_durbin.output
var_names coefficients std_err zt_stat prob
0 CONSTANT 4.401911 0.265898 16.554891 0.0
1 STRUGRAD 0.671598 0.024828 27.04974 0.0
2 C(RESITYP)[T.TH] -0.182995 0.016364 -11.183079 0.0
3 C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.006487 0.072993 -0.088874 0.929182
4 C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.473984 0.108444 -4.370786 0.000012
5 C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.277947 0.223047 1.246136 0.212715
6 C(DESCCNST)[T.CNST Block] -0.018457 0.066399 -0.277969 0.781036
7 C(DESCCNST)[T.CNST Brick] 0.007139 0.04737 0.150715 0.8802
8 C(DESCCNST)[T.CNST Frame] 0.007718 0.04802 0.160732 0.872305
9 C(DESCCNST)[T.CNST Shingle Asbestos] 0.057899 0.053029 1.091849 0.274899
10 C(DESCCNST)[T.CNST Shingle Wood] 0.014076 0.056884 0.247455 0.804556
11 C(DESCCNST)[T.CNST Siding] 0.042812 0.047774 0.896124 0.370187
12 C(DESCCNST)[T.CNST Stone] 0.186314 0.070299 2.650319 0.008042
13 C(DESCCNST)[T.CNST Stucco] -0.016265 0.052674 -0.308793 0.757479
14 np.log(SQFTSTRC) 0.551217 0.015321 35.978524 0.0
15 struc_age -0.004194 0.000519 -8.089123 0.0
16 age_square 0.000015 0.000003 4.812297 0.000001
17 p_nonhisp_white_persons 0.001845 0.000377 4.897564 0.000001
18 p_edu_college_greater 0.002214 0.000456 4.858435 0.000001
19 log_inc 0.142399 0.017327 8.218082 0.0
20 p_housing_units_multiunit_structures 0.001531 0.000279 5.483841 0.0
21 p_owner_occupied_units 0.003139 0.000554 5.669901 0.0
22 log_jobaccess 0.076651 0.016692 4.592009 0.000004
23 W_STRUGRAD -0.420216 0.069763 -6.023456 0.0
24 W_C(RESITYP)[T.TH] 0.113656 0.028227 4.026475 0.000057
25 W_C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.859755 0.22416 -3.835448 0.000125
26 W_C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.545583 0.295537 -1.846073 0.064882
27 W_C(DESCCNST)[T.CNST 1/2 Stone Siding] -1.152136 1.024271 -1.124834 0.260659
28 W_C(DESCCNST)[T.CNST Block] -0.803252 0.199806 -4.020164 0.000058
29 W_C(DESCCNST)[T.CNST Brick] -0.803217 0.15031 -5.343742 0.0
30 W_C(DESCCNST)[T.CNST Frame] -0.872277 0.151961 -5.740152 0.0
31 W_C(DESCCNST)[T.CNST Shingle Asbestos] -0.87157 0.165041 -5.280929 0.0
32 W_C(DESCCNST)[T.CNST Shingle Wood] -0.882868 0.171222 -5.15626 0.0
33 W_C(DESCCNST)[T.CNST Siding] -0.76622 0.14714 -5.207417 0.0
34 W_C(DESCCNST)[T.CNST Stone] -0.878239 0.227893 -3.853728 0.000116
35 W_C(DESCCNST)[T.CNST Stucco] -0.701842 0.166616 -4.212335 0.000025
36 W_np.log(SQFTSTRC) -0.547542 0.053903 -10.157867 0.0
37 W_struc_age 0.00317 0.001059 2.992901 0.002763
38 W_age_square -0.00001 0.000006 -1.59236 0.111304
39 W_p_nonhisp_white_persons -0.002865 0.00049 -5.840696 0.0
40 W_p_edu_college_greater 0.003508 0.000728 4.820224 0.000001
41 W_log_inc -0.207512 0.044475 -4.665836 0.000003
42 W_p_housing_units_multiunit_structures -0.00097 0.000588 -1.649983 0.098946
43 W_p_owner_occupied_units 0.000011 0.001375 0.008305 0.993373
44 W_log_jobaccess -0.086823 0.019535 -4.444552 0.000009
45 W_log_price 0.707867 0.077415 9.143786 0.0
Code
print(model_durbin.summary)
REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES WITH SLX (SPATIAL DURBIN MODEL)
----------------------------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :   log_price                Number of Observations:        5543
Mean dependent var  :     12.1398                Number of Variables   :          46
S.D. dependent var  :      0.6173                Degrees of Freedom    :        5497
Pseudo R-squared    :      0.7540
Spatial Pseudo R-squared:  0.7219

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT         4.40191         0.26590        16.55489         0.00000
            STRUGRAD         0.67160         0.02483        27.04974         0.00000
    C(RESITYP)[T.TH]        -0.18300         0.01636       -11.18308         0.00000
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.00649         0.07299        -0.08887         0.92918
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.47398         0.10844        -4.37079         0.00001
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.27795         0.22305         1.24614         0.21271
C(DESCCNST)[T.CNST Block]        -0.01846         0.06640        -0.27797         0.78104
C(DESCCNST)[T.CNST Brick]         0.00714         0.04737         0.15072         0.88020
C(DESCCNST)[T.CNST Frame]         0.00772         0.04802         0.16073         0.87230
C(DESCCNST)[T.CNST Shingle Asbestos]         0.05790         0.05303         1.09185         0.27490
C(DESCCNST)[T.CNST Shingle Wood]         0.01408         0.05688         0.24746         0.80456
C(DESCCNST)[T.CNST Siding]         0.04281         0.04777         0.89612         0.37019
C(DESCCNST)[T.CNST Stone]         0.18631         0.07030         2.65032         0.00804
C(DESCCNST)[T.CNST Stucco]        -0.01627         0.05267        -0.30879         0.75748
    np.log(SQFTSTRC)         0.55122         0.01532        35.97852         0.00000
           struc_age        -0.00419         0.00052        -8.08912         0.00000
          age_square         0.00001         0.00000         4.81230         0.00000
p_nonhisp_white_persons         0.00184         0.00038         4.89756         0.00000
p_edu_college_greater         0.00221         0.00046         4.85844         0.00000
             log_inc         0.14240         0.01733         8.21808         0.00000
p_housing_units_multiunit_structures         0.00153         0.00028         5.48384         0.00000
p_owner_occupied_units         0.00314         0.00055         5.66990         0.00000
       log_jobaccess         0.07665         0.01669         4.59201         0.00000
          W_STRUGRAD        -0.42022         0.06976        -6.02346         0.00000
  W_C(RESITYP)[T.TH]         0.11366         0.02823         4.02648         0.00006
W_C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.85975         0.22416        -3.83545         0.00013
W_C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.54558         0.29554        -1.84607         0.06488
W_C(DESCCNST)[T.CNST 1/2 Stone Siding]        -1.15214         1.02427        -1.12483         0.26066
W_C(DESCCNST)[T.CNST Block]        -0.80325         0.19981        -4.02016         0.00006
W_C(DESCCNST)[T.CNST Brick]        -0.80322         0.15031        -5.34374         0.00000
W_C(DESCCNST)[T.CNST Frame]        -0.87228         0.15196        -5.74015         0.00000
W_C(DESCCNST)[T.CNST Shingle Asbestos]        -0.87157         0.16504        -5.28093         0.00000
W_C(DESCCNST)[T.CNST Shingle Wood]        -0.88287         0.17122        -5.15626         0.00000
W_C(DESCCNST)[T.CNST Siding]        -0.76622         0.14714        -5.20742         0.00000
W_C(DESCCNST)[T.CNST Stone]        -0.87824         0.22789        -3.85373         0.00012
W_C(DESCCNST)[T.CNST Stucco]        -0.70184         0.16662        -4.21234         0.00003
  W_np.log(SQFTSTRC)        -0.54754         0.05390       -10.15787         0.00000
         W_struc_age         0.00317         0.00106         2.99290         0.00276
        W_age_square        -0.00001         0.00001        -1.59236         0.11130
W_p_nonhisp_white_persons        -0.00286         0.00049        -5.84070         0.00000
W_p_edu_college_greater         0.00351         0.00073         4.82022         0.00000
           W_log_inc        -0.20751         0.04447        -4.66584         0.00000
W_p_housing_units_multiunit_structures        -0.00097         0.00059        -1.64998         0.09895
W_p_owner_occupied_units         0.00001         0.00138         0.00831         0.99337
     W_log_jobaccess        -0.08682         0.01953        -4.44455         0.00001
         W_log_price         0.70787         0.07742         9.14379         0.00000
------------------------------------------------------------------------------------
Instrumented: W_log_price
Instruments: W2_C(DESCCNST)[T.CNST 1/2 Brick Siding], W2_C(DESCCNST)[T.CNST
             1/2 Stone Frame], W2_C(DESCCNST)[T.CNST 1/2 Stone Siding],
             W2_C(DESCCNST)[T.CNST Block], W2_C(DESCCNST)[T.CNST Brick],
             W2_C(DESCCNST)[T.CNST Frame], W2_C(DESCCNST)[T.CNST Shingle
             Asbestos], W2_C(DESCCNST)[T.CNST Shingle Wood],
             W2_C(DESCCNST)[T.CNST Siding], W2_C(DESCCNST)[T.CNST Stone],
             W2_C(DESCCNST)[T.CNST Stucco], W2_C(RESITYP)[T.TH],
             W2_STRUGRAD, W2_age_square, W2_log_inc, W2_log_jobaccess,
             W2_np.log(SQFTSTRC), W2_p_edu_college_greater,
             W2_p_housing_units_multiunit_structures,
             W2_p_nonhisp_white_persons, W2_p_owner_occupied_units,
             W2_struc_age
Warning: Variable(s) ['Intercept'] removed for being constant.

DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST                              DF         VALUE           PROB
Anselin-Kelejian Test             1         25.061           0.0000
Common Factor Hypothesis Test    22        326.670           0.0000

SPATIAL DURBIN MODEL IMPACTS
Impacts computed using the 'simple' method.
            Variable         Direct        Indirect          Total
            STRUGRAD         0.6716          0.1889          0.8605
    C(RESITYP)[T.TH]        -0.1830         -0.0544         -0.2374
C(DESCCNST)[T.CNST 1/2 Brick Siding]        -0.0065         -2.9587         -2.9652
C(DESCCNST)[T.CNST 1/2 Stone Frame]        -0.4740         -3.0161         -3.4901
C(DESCCNST)[T.CNST 1/2 Stone Siding]         0.2779         -3.2704         -2.9924
C(DESCCNST)[T.CNST Block]        -0.0185         -2.7943         -2.8128
C(DESCCNST)[T.CNST Brick]         0.0071         -2.7322         -2.7251
C(DESCCNST)[T.CNST Frame]         0.0077         -2.9672         -2.9595
C(DESCCNST)[T.CNST Shingle Asbestos]         0.0579         -2.8432         -2.7853
C(DESCCNST)[T.CNST Shingle Wood]         0.0141         -2.9880         -2.9740
C(DESCCNST)[T.CNST Siding]         0.0428         -2.5191         -2.4763
C(DESCCNST)[T.CNST Stone]         0.1863         -2.5548         -2.3685
C(DESCCNST)[T.CNST Stucco]        -0.0163         -2.4419         -2.4582
    np.log(SQFTSTRC)         0.5512         -0.5386          0.0126
           struc_age        -0.0042          0.0007         -0.0035
          age_square         0.0000          0.0000          0.0000
p_nonhisp_white_persons         0.0018         -0.0053         -0.0035
p_edu_college_greater         0.0022          0.0174          0.0196
             log_inc         0.1424         -0.3653         -0.2229
p_housing_units_multiunit_structures         0.0015          0.0004          0.0019
p_owner_occupied_units         0.0031          0.0076          0.0108
       log_jobaccess         0.0767         -0.1115         -0.0348
================================ END OF REPORT =====================================

Notice in the report for the Durbin model the instrumented variable (W_dep_var) is estimated using instruments prefixed with W2_ indicating second-order spatial lags of the explanatory variables. That is, the GM_Lag class noticed we included slx_lags=1 and automatically incremented the order of \(WX\) to use as instruments. Obviously, you can modify this behavior or pass your own instruments if desired.

30.3.7 Comparison

Sometimes it can be useful to compare the estimated coefficients from each model side-by-side, but until now we’ve had them each in separate summaries. For simplicity, we can concatenate the output tables from each of the models, and we can make sure everything stays aligned by setting the variable as an index.

Code
pd.concat(
    [
        model_durbin.output.set_index("var_names")["coefficients"].rename("durbin"),
        model_slxe.output.set_index("var_names")["coefficients"].rename("slxe"),
        model_sar.output.set_index("var_names")["coefficients"].rename("lag"),
        model_sem.output.set_index("var_names")["coefficients"].rename("sem"),
        model_slx.output.set_index("var_names")["coefficients"].rename("slx"),
        model_ols_spreg.output.set_index("var_names")["coefficients"].rename("ols"),
    ],
    axis=1,
)
durbin slxe lag sem slx ols
var_names
CONSTANT 4.401911 5.237416 3.212091 5.184821 4.642039 4.041403
STRUGRAD 0.671598 0.660464 0.663002 0.690482 0.659429 0.692455
C(RESITYP)[T.TH] -0.182995 -0.184457 -0.145262 -0.177419 -0.172735 -0.150846
C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.006487 -0.036857 -0.034747 -0.002248 -0.038146 -0.046374
C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.473984 -0.481501 -0.482342 -0.431828 -0.480313 -0.484899
C(DESCCNST)[T.CNST 1/2 Stone Siding] 0.277947 0.320318 0.439415 0.365504 0.344024 0.474311
C(DESCCNST)[T.CNST Block] -0.018457 -0.040008 -0.003193 0.015147 -0.054803 -0.002453
C(DESCCNST)[T.CNST Brick] 0.007139 -0.017152 -0.004066 0.026149 -0.025992 -0.011138
C(DESCCNST)[T.CNST Frame] 0.007718 -0.021026 -0.006989 0.029853 -0.030933 -0.014627
C(DESCCNST)[T.CNST Shingle Asbestos] 0.057899 0.049323 0.107672 0.095888 0.044798 0.109158
C(DESCCNST)[T.CNST Shingle Wood] 0.014076 -0.007045 0.007915 0.057046 -0.024704 0.014471
C(DESCCNST)[T.CNST Siding] 0.042812 0.029617 0.063223 0.067587 0.020576 0.064087
C(DESCCNST)[T.CNST Stone] 0.186314 0.162651 0.154078 0.213639 0.156798 0.164883
C(DESCCNST)[T.CNST Stucco] -0.016265 -0.029299 -0.009916 0.013774 -0.041288 -0.00955
np.log(SQFTSTRC) 0.551217 0.530204 0.509048 0.533307 0.542232 0.524706
struc_age -0.004194 -0.004403 -0.004213 -0.004462 -0.004198 -0.004141
age_square 0.000015 0.000016 0.000016 0.000017 0.000015 0.000016
p_nonhisp_white_persons 0.001845 0.001249 -0.000182 0.001332 0.001516 -0.000276
p_edu_college_greater 0.002214 0.002927 0.004653 0.004356 0.002195 0.005471
log_inc 0.142399 0.122706 0.201296 0.13974 0.14734 0.227823
p_housing_units_multiunit_structures 0.001531 0.001479 0.001887 0.001637 0.001573 0.002249
p_owner_occupied_units 0.003139 0.003561 0.006503 0.004151 0.003835 0.007012
log_jobaccess 0.076651 0.045178 0.031762 0.039208 0.056401 0.036806
W_STRUGRAD -0.420216 0.021399 NaN NaN 0.026409 NaN
W_C(RESITYP)[T.TH] 0.113656 0.037688 NaN NaN 0.075844 NaN
W_C(DESCCNST)[T.CNST 1/2 Brick Siding] -0.859755 -0.696259 NaN NaN -1.057799 NaN
W_C(DESCCNST)[T.CNST 1/2 Stone Frame] -0.545583 -0.756676 NaN NaN -1.453311 NaN
W_C(DESCCNST)[T.CNST 1/2 Stone Siding] -1.152136 0.542445 NaN NaN 1.295219 NaN
W_C(DESCCNST)[T.CNST Block] -0.803252 -0.562274 NaN NaN -0.975344 NaN
W_C(DESCCNST)[T.CNST Brick] -0.803217 -0.574081 NaN NaN -0.972882 NaN
W_C(DESCCNST)[T.CNST Frame] -0.872277 -0.656507 NaN NaN -1.078228 NaN
W_C(DESCCNST)[T.CNST Shingle Asbestos] -0.87157 -0.545031 NaN NaN -0.768816 NaN
W_C(DESCCNST)[T.CNST Shingle Wood] -0.882868 -0.732053 NaN NaN -1.173693 NaN
W_C(DESCCNST)[T.CNST Siding] -0.76622 -0.460589 NaN NaN -0.775578 NaN
W_C(DESCCNST)[T.CNST Stone] -0.878239 -0.655921 NaN NaN -1.066295 NaN
W_C(DESCCNST)[T.CNST Stucco] -0.701842 -0.514824 NaN NaN -0.900155 NaN
W_np.log(SQFTSTRC) -0.547542 -0.133987 NaN NaN -0.137501 NaN
W_struc_age 0.00317 0.001005 NaN NaN 0.003047 NaN
W_age_square -0.00001 -0.000003 NaN NaN -0.000013 NaN
W_p_nonhisp_white_persons -0.002865 -0.00334 NaN NaN -0.004045 NaN
W_p_edu_college_greater 0.003508 0.006143 NaN NaN 0.006105 NaN
W_log_inc -0.207512 0.09634 NaN NaN 0.131123 NaN
W_p_housing_units_multiunit_structures -0.00097 0.001204 NaN NaN 0.00181 NaN
W_p_owner_occupied_units 0.000011 0.008228 NaN NaN 0.008986 NaN
W_log_jobaccess -0.086823 -0.022268 NaN NaN -0.025126 NaN
W_log_price 0.707867 NaN 0.115276 NaN NaN NaN
lambda NaN 0.508535 NaN 0.671989 NaN NaN

These models have different sets of terms (e.g. only SEM variants have lambda, only SLX variants have W_ variables, and only Lag variants have W_dep_var), so models missing the necessary coefficients will have NAs. Also remember that we’re joining together only the output from the lag models, not the impacts, so we are ignoring the indirect effects from these models.

Comparing the estimated coefficients shows that median income was doing a lot of heavy lifting in the OLS model. That is, when we ignore spatial effects, the median income variable absorbs a lot of the spatial variance, inflating its coefficient and biasing our interpretation of how much residents value higher-income neighbors. In the spatial models, the estimated coefficient for income is less than half as large. That’s basically true of all the other neighborhood characteristics as well.

Remember that our data are measured at two different spatial levels. Our housing sales are measured at the unit level, but the ‘neighborhood’ variables are measured at the census blockgroup level. Because we have several sales nested inside each census unit, we are building spatial autocorrelation into the model by design. That is, because we don’t know the income, education, race, etc of the home’s previous owner, the “neighborhood” variables coming from the Census are defined in geographic zones which are essentially \(WX\) instead of just \(X\), where \(W\) is defined by discrete membership inside a census unit.

30.5 Extensions

The spreg package also includes a wide variety of additional models we do not explore here, including spatial Seemingly Unrelated Regression (SUR), spatial Panel models (Elhorst, 2003, 2014), and diagnostics for Probit and non-linear models.

30.5.1 Regimes

In all of the examples above, we assume the relationship between \(X\) and \(y\) is constant over space. In some applications, however, there may be reason to believe that coefficients vary over the study region, i.e. there are different “treatment regimes” (Imai & van Dyk, 2004). Rather than estimate a separate regression for every data point a-la GWR, one way to handle spatially-varying relationships in a spatial econometric framework is through the use of spatial regimes which separate the study region into discrete portions (defined by a different \(W\) matrix), then fit separate models for each portion (Durlauf & Johnson, 1995). This is a bit like the geographic version of piecewise regression, where geographic zones are used to define the breakpoints rather than ranges of the outcome variable. Each of the models above optionally takes a regime argument that defines the grouping for each observation. Alternatively, spreg also offers a function for determining optimal regimes endogenously by combining SKATER constrained clustering with spatial regression, following Anselin & Amaral (2023).

30.5.2 Quasi Experiments and Causal Designs

A prominent critique of applied spatial econometric modeling is given by Gibbons & Overman (2012), who argue that the suite of models above

does not provide a toolbox that gives satisfactory solutions to these problems for applied researchers interested in causality and the economic processes at work. It is in this sense that we call into question the approach of the burgeoning body of applied economic research that proceeds with mechanical application of spatial econometric techniques to causal questions of “spillovers” and other forms of spatial interaction, or that estimates spatial lag models as a quick fix for endogeneity issues, or that blindly applies spatial econometric models by default without serious consideration of whether this is necessary given the research question in hand.

To overcome these issues, they argue that applied work should adopt what they term ‘experimentalist’ design4, where some form of exogenous variation is induced to provide insight into a causal mechanism at play. Spatial econometric models that incorporate quasi experimental designs are an active area of research, with early pioneering work by Dubé et al. (2014), Delgado & Florax (2015), Bardaka et al. (2018), and Diao et al. (2017), who blend different spatial specification above with a difference-in-differences framework. While this work represents an important step forward from traditional designs that ignore spatial dependence altogether, the complexity involved with DID designs that include spatial terms is still under-appreciated, and many important issues in urban policy analysis do not have a satisfactory specification using such an approach (Knaap, 2024).

Alternatively, structural models5 should proceed by considering the SLX model as the point of depatrture, or use strong theory to guide the choice of a particular specification as described above (Halleck Vega & Elhorst, 2015). Although there remains debate in urban economics, I believe there is a good justification for assuming a potential autoregressive process in the specific context of hedonic housing price models.


  1. Despite the preferences of LeSage & Pace (2009), Anselin et al. (2024) use the reduced form of the unrestricted SDM to show that “there is no solid theoretical basis for preferring one specification over the other, only an empirical one.”↩︎

  2. Using different arguments in the spreg classes allows one to fit all of the models described in previous sections, and the Wilkinson formula need not be changed for these specifications. It is also possible to fit increasingly complex models, e.g. where different explanatory variables are defined by different spatial scopes (\(W_1X_1, W_2 X_2\), etc.) (Galster, 2019), but these kinds of specifications need to be created manually e.g. by creating \(WX\) variables on the dataframe using the Graph.lag(x) convention. In this case, a user would want to specify these terms directly in Wilkinson formulas, and avoid using the slx_vars argument in spreg which applies the same Graph specification to each \(X\) variable.↩︎

  3. For a deep dive on computing specification tests, see Anselin et al. (1996) and the spreg documentation at https://pysal.org/spreg/notebooks/9_specification_tests.html↩︎

  4. Today this is more commonly called the “potential outcomes” framework (Rubin, 2005).↩︎

  5. Today I think econometricians prefer to call this “‘design-based’ inference” rather than “structural estimation” (Abadie et al., 2020; Borusyak et al., 2024; Manski & McFadden, 1981; Witte et al., 1979).↩︎