Code
from factor_analyzer import (ConfirmatoryFactorAnalyzer, ModelSpecificationParser)
Sampson and Raudenbush outline a theory of “ecometrics” designed to capture the social-ecological context as a way to help measure the latent construct of collective efficacy. A key to the original methodology is that it relies on actual observations of social interaction as measured by an intentionally-designed survey. These data are combined with others gathered via systematic social observation.
This idea has been expanded to include new forms of data like google street maps and VGI (boston stuff). We usually don’t have SSO data, but we do have lots of other data like 411 reports, google street view, and satellite imagery and we may be able to substitute these data for SSO–if we believe they accurately capture the social process under investigation (i.e. if we believe that some social process like “disorder” is the underlying driver of the observed data).
The key distinction between ecometrics and earlier methods like factor ecology is the reliance on a formal theoretical model underneath. We’re not allowing the data to speak for themselves; instead we are specifying a set of theoretical social processes which are unobservable directly, but might be inferred to exist if we treat them as latent variables. We then fit a model and test whether these latent constructs appear as specified.
Using the ecometric framework, we can try and capture the geography of opportunity following the theory outlined by Galster, who argues that individual-level outcomes are a function of individual characteristics, as well as spatial characteristics (at multiple scales).
\[O_{it} = \alpha + \beta[P_{it}] + \gamma[P_i] + \varphi[UP_{it}] + \delta[UP_i] + \theta[N_{jt}] + \mu[M_{kt}] + \epsilon\]
where
To capture the geography of opportunity, then, our focus is on the \(M\) and \(N\) components of the equation
acccording to Galster (2013), the key vectors of these terms are composed of four categories: - social-interactive, - environmental, - geographic, and - institutional.
These categories are well supported by the empirical literature, and are theoretically grounded in causal processes that generate socioeconomic outcomes.
Following Knaap,
One way to address this problem is to treat the quantification of opportunity as a measurement error problem. Through a liberal interpretation, this may be viewed as an extension of ecometrics, a methodology concerned with developing measures of neighborhood social ecology (raudenbush1999ecometrics?; Mujahid:AmJEpidemiol:2007?; OBrien2013?). In this framework, opportunity and its subdimensions are viewed as latent variables that cannot be measured directly, but can be estimated by modeling the covariation among the indicators through which they manifest. As with any measurement model, however, opportunity metrics require a sound theoretical framework for organizing and specifying relationships among variables. As described above, a major weakness of opportunity analyses to date has been the lack of a sound framework for organizing indicators into categories of metrics. To address this issue, I argue that the literature on neighborhood effects offers a sound organizing framework for classifying subdimensions of opportunity. Specifically, I propose that neighborhood indicators should be categorized according to the four mechanisms of neighborhood effects outlined by Galster (2013): social-interactive, environmental, geographic, and institutional. These categories are well supported by the empirical literature, and are theoretically grounded in causal processes that generate socioeconomic outcomes.
from factor_analyzer import (ConfirmatoryFactorAnalyzer, ModelSpecificationParser)
cfa.ConfirmatoryFactorAnalyzer?
Init signature: cfa.ConfirmatoryFactorAnalyzer( specification=None, n_obs=None, is_cov_matrix=False, bounds=None, max_iter=200, tol=None, impute='median', disp=True, ) Docstring: Fit a confirmatory factor analysis model using maximum likelihood. Parameters ---------- specification : :class:`ModelSpecification` or None, optional A model specification. This must be a :class:`ModelSpecification` object or ``None``. If ``None``, a :class:`ModelSpecification` object will be generated assuming that ``n_factors`` == ``n_variables``, and that all variables load on all factors. Note that this could mean the factor model is not identified, and the optimization could fail. Defaults to `None`. n_obs : int or None, optional The number of observations in the original data set. If this is not passed and ``is_cov_matrix`` is ``True``, then an error will be raised. Defaults to ``None``. is_cov_matrix : bool, optional Whether the input ``X`` is a covariance matrix. If ``False``, assume it is the full data set. Defaults to ``False``. bounds : list of tuples or None, optional A list of minimum and maximum boundaries for each element of the input array. This must equal ``x0``, which is the input array from your parsed and combined model specification. The length is: ((n_factors * n_variables) + n_variables + n_factors + (((n_factors * n_factors) - n_factors) // 2) If `None`, nothing will be bounded. Defaults to ``None``. max_iter : int, optional The maximum number of iterations for the optimization routine. Defaults to 200. tol : float or None, optional The tolerance for convergence. Defaults to ``None``. disp : bool, optional Whether to print the scipy optimization ``fmin`` message to standard output. Defaults to ``True``. Raises ------ ValueError If `is_cov_matrix` is `True`, and `n_obs` is not provided. Attributes ---------- model : ModelSpecification The model specification object. loadings_ : :obj:`numpy.ndarray` The factor loadings matrix. ``None``, if ``fit()``` has not been called. error_vars_ : :obj:`numpy.ndarray` The error variance matrix factor_varcovs_ : :obj:`numpy.ndarray` The factor covariance matrix. log_likelihood_ : float The log likelihood from the optimization routine. aic_ : float The Akaike information criterion. bic_ : float The Bayesian information criterion. Examples -------- >>> import pandas as pd >>> from factor_analyzer import (ConfirmatoryFactorAnalyzer, ... ModelSpecificationParser) >>> X = pd.read_csv('tests/data/test11.csv') >>> model_dict = {"F1": ["V1", "V2", "V3", "V4"], ... "F2": ["V5", "V6", "V7", "V8"]} >>> model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict) >>> cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False) >>> cfa.fit(X.values) >>> cfa.loadings_ array([[0.99131285, 0. ], [0.46074919, 0. ], [0.3502267 , 0. ], [0.58331488, 0. ], [0. , 0.98621042], [0. , 0.73389239], [0. , 0.37602988], [0. , 0.50049507]]) >>> cfa.factor_varcovs_ array([[1. , 0.17385704], [0.17385704, 1. ]]) >>> cfa.get_standard_errors() (array([[0.06779949, 0. ], [0.04369956, 0. ], [0.04153113, 0. ], [0.04766645, 0. ], [0. , 0.06025341], [0. , 0.04913149], [0. , 0.0406604 ], [0. , 0.04351208]]), array([0.11929873, 0.05043616, 0.04645803, 0.05803088, 0.10176889, 0.06607524, 0.04742321, 0.05373646])) >>> cfa.transform(X.values) array([[-0.46852166, -1.08708035], [ 2.59025301, 1.20227783], [-0.47215977, 2.65697245], ..., [-1.5930886 , -0.91804114], [ 0.19430887, 0.88174818], [-0.27863554, -0.7695101 ]]) Init docstring: Initialize the analyzer object. File: ~/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/factor_analyzer/confirmatory_factor_analyzer.py Type: type Subclasses:
from geosnap import DataStore
from geosnap import analyze as gaz
from geosnap import visualize as gvz
from geosnap import io as gio
= DataStore() datasets
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/geosnap/_data.py:66: UserWarning: The geosnap data storage class is provided for convenience only. The geosnap developers make no promises regarding data quality, consistency, or availability, nor are they responsible for any use/misuse of the data. The end-user is responsible for any and all analyses or applications created with the package.
warn(
datasets.data_dir
'/Users/knaaptime/Library/Application Support/geosnap'
= gio.get_acs(datasets, msa_fips='12580', years=[2020], level='bg') balt
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/geosnap/io/constructors.py:195: UserWarning: Currency columns unavailable at this resolution; not adjusting for inflation
warn(
balt.plot()
balt.p_poverty_rate
0 2.074109
1 19.256178
2 5.142332
3 3.562818
4 3.203015
...
712 19.483568
713 9.641694
714 22.825186
715 53.123387
716 54.981914
Name: p_poverty_rate, Length: 717, dtype: float64
balt.median_contract_rent
0 NaN
1 1125.0
2 1694.0
3 1584.0
4 NaN
...
1987 233.0
1988 235.0
1989 273.0
1990 NaN
1991 241.0
Name: median_contract_rent, Length: 1992, dtype: float64
balt.p_female_headed_families
0 0.000000
1 0.000000
2 0.000000
3 15.584416
4 9.926471
...
1987 0.000000
1988 0.000000
1989 60.515021
1990 40.613027
1991 0.000000
Name: p_female_headed_families, Length: 1992, dtype: float64
gio.get_nces?
Signature: gio.get_nces(datastore, years='1516', dataset='sabs') Docstring: Extract a subset of data from the National Center for Educational Statistics as a long-form geodataframe. Parameters ---------- datastore : geosnap.DataStore an instantiated DataStore object years : str, optional set of academic years to return formatted as a 4-digit string representing the two years from a single period of the academic calendar. For example, the 2015-2016 academic year is represented as "1516". Defaults to "1516" dataset : str, optional which NCES dataset to query. Options include `sabs`, `districts`, or `schools` Defaults to 'sabs' Returns ------- geopandas.GeoDataFrame long-form geodataframe with 'year' column representing each time period File: ~/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/geosnap/io/constructors.py Type: function
= gio.get_nces(datasets) sabs
= sabs[sabs.intersects(balt.to_crs(sabs.crs).unary_union)] sabs
sabs.plot()
= gio.get_seda(accept_eula=True) seda
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[42], line 1 ----> 1 seda = gio.get_seda(accept_eula=True) AttributeError: module 'geosnap.io' has no attribute 'get_seda'
= datasets.seda(accept_eula=True) seda
/Users/knaaptime/mambaforge/envs/urban_analysis/lib/python3.10/site-packages/geosnap/_data.py:204: UserWarning: Streaming data from SEDA archive at <https://exhibits.stanford.edu/data/catalog/db586ns4974>.
Use `geosnap.io.store_seda()` to store the data locally for better performance
warn(msg)
seda
sedasch | sedaschname | fips | stateabb | subcat | subgroup | gradecenter | gap | tot_asmts | cellcount | ... | gcs_mn_grd_ol_se | gcs_mn_mth_ol_se | gcs_mn_avg_eb | gcs_mn_coh_eb | gcs_mn_grd_eb | gcs_mn_mth_eb | gcs_mn_avg_eb_se | gcs_mn_coh_eb_se | gcs_mn_grd_eb_se | gcs_mn_mth_eb_se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 010000201667 | Camps | 1 | AL | all | all | 7.5 | 0 | 13 | 2 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 010000201670 | Det Ctr | 1 | AL | all | all | 7.5 | 0 | 2 | 1 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 010000201705 | Wallace Sch - Mt Meigs Campus | 1 | AL | all | all | 7.0 | 0 | 98 | 12 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 010000201706 | McNeel Sch - Vacca Campus | 1 | AL | all | all | 7.0 | 0 | 118 | 12 | ... | NaN | NaN | 2.617151 | NaN | NaN | NaN | 0.470695 | NaN | NaN | NaN |
4 | 010000500870 | Albertville Middle School | 1 | AL | all | all | 7.5 | 0 | 12520 | 39 | ... | NaN | 0.165719 | 6.363170 | -0.026773 | NaN | -0.257613 | 0.082774 | 0.027471 | NaN | 0.155704 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
82483 | 729999200586 | TOMAS ALBA EDISON | 72 | PR | all | all | 5.0 | 0 | 36 | 3 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
82484 | 729999200776 | CENTRO VOCACIONAL ESPECIAL | 72 | PR | all | all | 7.0 | 0 | 20 | 7 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
82485 | 729999201270 | TOMAS CARRION MADURO | 72 | PR | all | all | 5.5 | 0 | 140 | 27 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
82486 | 729999201511 | JOSE M. TORRES | 72 | PR | all | all | 5.5 | 0 | 1505 | 48 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
82487 | 729999202037 | VICTOR ROJAS 1 | 72 | PR | all | all | 7.0 | 0 | 79 | 11 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
82488 rows × 27 columns
sabs
SrcName | ncessch | schnam | leaid | gslo | gshi | defacto | stAbbrev | openEnroll | Shape_Leng | Shape_Area | level | MultiBdy | geometry | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13648 | CENTRAL | 100019000049 | Central Middle School | 1000190 | 07 | 08 | 0 | DE | 0 | 175363.725385 | 5.007095e+08 | 2 | 0 | POLYGON ((-8407593.100 4757523.450, -8407558.1... | 1516 |
13649 | DOVER | 100019000050 | Dover High School | 1000190 | 09 | 12 | 0 | DE | 0 | 175363.725385 | 5.007095e+08 | 3 | 0 | POLYGON ((-8407593.100 4757523.450, -8407558.1... | 1516 |
13652 | HARTLY | 100019000053 | Hartly Elementary School | 1000190 | PK | 04 | 0 | DE | 0 | 89121.606361 | 1.590547e+08 | 1 | 0 | POLYGON ((-8417807.811 4750035.861, -8417830.7... | 1516 |
13656 | WILLIAM HENRY | 100019000058 | William Henry Middle School | 1000190 | 05 | 06 | 0 | DE | 0 | 175363.725385 | 5.007095e+08 | 2 | 0 | POLYGON ((-8407593.100 4757523.450, -8407558.1... | 1516 |
13783 | SMYRNA | 100162000141 | Smyrna Middle School | 1001620 | 06 | 08 | 0 | DE | 0 | 181685.335949 | 7.544446e+08 | 2 | 0 | POLYGON ((-8413012.559 4784548.843, -8413007.5... | 1516 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
30984 | None | 240063001242 | Easton High | 2400630 | 09 | 12 | 0 | MD | 0 | 184440.210578 | 8.073253e+08 | 3 | 0 | POLYGON ((-8467148.930 4703358.544, -8467132.3... | 1516 |
30985 | None | 240063001243 | Easton Middle | 2400630 | 06 | 08 | 0 | MD | 0 | 184440.210578 | 8.073253e+08 | 2 | 0 | POLYGON ((-8467148.930 4703358.544, -8467132.3... | 1516 |
30991 | Not provided | 240063099991 | Unassigned | 2400630 | KG | 05 | 0 | MD | 0 | 400134.693015 | 6.383886e+08 | 1 | 0 | POLYGON ((-8470595.787 4705531.321, -8470628.8... | 1516 |
30992 | Not provided | 240063099992 | Unassigned | 2400630 | 06 | 08 | 0 | MD | 0 | 400134.693015 | 6.383886e+08 | 2 | 0 | POLYGON ((-8470595.787 4705531.321, -8470628.8... | 1516 |
30993 | Not provided | 240063099993 | Unassigned | 2400630 | 09 | 12 | 0 | MD | 0 | 400134.693015 | 6.383886e+08 | 3 | 0 | POLYGON ((-8470595.787 4705531.321, -8470628.8... | 1516 |
682 rows × 15 columns
= sabs.merge(seda, left_on='ncessch', right_on='sedasch') sabs
sabs.columns
Index(['SrcName', 'ncessch', 'schnam', 'leaid', 'gslo', 'gshi', 'defacto',
'stAbbrev', 'openEnroll', 'Shape_Leng', 'Shape_Area', 'level',
'MultiBdy', 'geometry', 'year', 'sedasch', 'sedaschname', 'fips',
'stateabb', 'subcat', 'subgroup', 'gradecenter', 'gap', 'tot_asmts',
'cellcount', 'mn_asmts', 'gcs_mn_avg_ol', 'gcs_mn_coh_ol',
'gcs_mn_grd_ol', 'gcs_mn_mth_ol', 'gcs_mn_avg_ol_se',
'gcs_mn_coh_ol_se', 'gcs_mn_grd_ol_se', 'gcs_mn_mth_ol_se',
'gcs_mn_avg_eb', 'gcs_mn_coh_eb', 'gcs_mn_grd_eb', 'gcs_mn_mth_eb',
'gcs_mn_avg_eb_se', 'gcs_mn_coh_eb_se', 'gcs_mn_grd_eb_se',
'gcs_mn_mth_eb_se'],
dtype='object')
:::