22  Ecometrics and CFA

Sampson and Raudenbush outline a theory of “ecometrics” designed to capture the social-ecological context as a way to help measure the latent construct of collective efficacy. A key to the original methodology is that it relies on actual observations of social interaction as measured by an intentionally-designed survey. These data are combined with others gathered via systematic social observation.

This idea has been expanded to include new forms of data like google street maps and VGI (boston stuff). We usually don’t have SSO data, but we do have lots of other data like 411 reports, google street view, and satellite imagery and we may be able to substitute these data for SSO–if we believe they accurately capture the social process under investigation (i.e. if we believe that some social process like “disorder” is the underlying driver of the observed data).

The key distinction between ecometrics and earlier methods like factor ecology is the reliance on a formal theoretical model underneath. We’re not allowing the data to speak for themselves; instead we are specifying a set of theoretical social processes which are unobservable directly, but might be inferred to exist if we treat them as latent variables. We then fit a model and test whether these latent constructs appear as specified.

Using the ecometric framework, we can try and capture the geography of opportunity following the theory outlined by Galster (2008), who argues that individual-level outcomes are a function of individual characteristics, as well as spatial characteristics (at multiple scales).

\[O_{it} = \alpha + \beta[P_{it}] + \gamma[P_i] + \phi[UP_{it}] + \delta[UP_i] + \theta[N_{jt}] + \mu[M_{kt}] + \epsilon\]

where

To capture the geography of opportunity, then, our focus is on the \(M\) and \(N\) components of the equation, and acccording to Galster (2013), the key vectors of these terms are composed of four categories: social-interactive, environmental, geographic, and institutional. These categories are well supported by the empirical literature, and are theoretically grounded in causal processes that generate socioeconomic outcomes.

Following Knaap (2017), we can combine Galaster’s theoretical framework with the ecometric technique to (a) test whether the proposed structure holds, and (b) develop composite indices that represent each of the dimensions.

One way to address this problem is to treat the quantification of opportunity as a measurement error problem. Through a liberal interpretation, this may be viewed as an extension of ecometrics, a methodology concerned with developing measures of neighborhood social ecology (Mujahid et al., 2007; O’Brien et al., 2015; Raudenbush & Sampson, 1999). In this framework, opportunity and its subdimensions are viewed as latent variables that cannot be measured directly, but can be estimated by modeling the covariation among the indicators through which they manifest… Specifically, I propose that neighborhood indicators should be categorized according to the four mechanisms of neighborhood effects outlined by Galster (2013): social-interactive, environmental, geographic, and institutional. These categories are well supported by the empirical literature, and are theoretically grounded in causal processes that generate socioeconomic outcomes.

Code
from factor_analyzer import (ConfirmatoryFactorAnalyzer, ModelSpecificationParser)
Code
cfa.ConfirmatoryFactorAnalyzer?
Object `cfa.ConfirmatoryFactorAnalyzer` not found.
Code
from geosnap import DataStore
from geosnap import analyze as gaz
from geosnap import visualize as gvz
from geosnap import io as gio
Code
datasets = DataStore()

balt = gio.get_acs(datasets, msa_fips='12580', years=[2020], level='bg')
sabs = gio.get_nces(datasets)
sabs = sabs[sabs.intersects(balt.to_crs(sabs.crs).unary_union)]

seda = datasets.seda(accept_eula=True)
sabs = sabs.merge(seda, left_on='ncessch', right_on='sedasch')

sabs.columns
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/util.py:273: UserWarning: Unable to find local adjustment year for 2020. Attempting from online data
  warn(
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/io/constructors.py:217: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
  warn(
/var/folders/j8/5bgcw6hs7cqcbbz48d6bsftw0000gp/T/ipykernel_1045/2604728068.py:5: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  sabs = sabs[sabs.intersects(balt.to_crs(sabs.crs).unary_union)]
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/_data.py:255: UserWarning: Streaming data from SEDA archive at <https://exhibits.stanford.edu/data/catalog/db586ns4974>.
Use `geosnap.io.store_seda()` to store the data locally for better performance
  warn(msg)
Index(['SrcName', 'ncessch', 'schnam', 'leaid', 'gslo', 'gshi', 'defacto',
       'stAbbrev', 'openEnroll', 'Shape_Leng', 'Shape_Area', 'level',
       'MultiBdy', 'geometry', 'year', 'sedasch', 'sedaschname', 'fips',
       'stateabb', 'subcat', 'subgroup', 'gradecenter', 'gap', 'tot_asmts',
       'cellcount', 'mn_asmts', 'last_bie', 'gcs_mn_avg_ol', 'gcs_mn_coh_ol',
       'gcs_mn_grd_ol', 'gcs_mn_mth_ol', 'gcs_mn_avg_ol_se',
       'gcs_mn_coh_ol_se', 'gcs_mn_grd_ol_se', 'gcs_mn_mth_ol_se',
       'gcs_mn_avg_eb', 'gcs_mn_coh_eb', 'gcs_mn_grd_eb', 'gcs_mn_mth_eb',
       'gcs_mn_avg_eb_se', 'gcs_mn_coh_eb_se', 'gcs_mn_grd_eb_se',
       'gcs_mn_mth_eb_se'],
      dtype='object')