Implementation of locality splitting metrics for political redistricing plans
Project description
import pandas as pd
import metrics
Calculating metrics of locality splitting in political districts
In order to calculate population-based splitting metrics, we need to know for every census block which district it is in. Much of this repository is devoted to generating this data in so-called "block equivalency files." Here is an example of such a data set.
PA_block_eq_df = pd.read_csv('clean_data/PA/PA_classifications.csv')
PA_block_eq_df.head()
GEOID10 | pop | sldl_2000 | cd_2013 | cd_2018 | sldu_2000 | sldl_2012 | sldl_2018 | cd_2003 | cd_2010 | sldu_2014 | sldl_2010 | sldl_2014 | sldu_2010 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 420350307003003 | 57 | 076 | 5 | 12 | 34 | 76 | 76 | 5 | 5 | 25 | 76 | 76 | 35 |
1 | 420350302001056 | 0 | 076 | 5 | 12 | 34 | 76 | 76 | 5 | 5 | 25 | 76 | 76 | 35 |
2 | 420350301001322 | 0 | 076 | 5 | 12 | 34 | 76 | 76 | 5 | 5 | 25 | 76 | 76 | 35 |
3 | 420350301002207 | 0 | 076 | 5 | 12 | 34 | 76 | 76 | 5 | 5 | 25 | 76 | 76 | 35 |
4 | 420350301001013 | 0 | 076 | 5 | 12 | 34 | 76 | 76 | 5 | 5 | 25 | 76 | 76 | 35 |
This DataFrame has one column for every plan in the state since 2000 (cd = congressional district, sldu = state legislative district upper, sldl = state legislative district lower). If a year is missing, it means the district plan provided to the Census Bureau was identical to the previous year.
Note that in many applications, generating the block equivalency files will not be necessary. For example, the Census Bureau published a national block equivalency file for congressional districts here https://www.census.gov/geographies/mapping-files/2019/dec/rdo/116-congressional-district-bef.html and state legislative districts here https://www.census.gov/geographies/mapping-files/2018/dec/rdo/2018-state-legislative-bef.html. Furthermore, states will often provide block equivalency files of proposed maps as part of the redistricting process. We wrote code for generating block equivalency files just so that we could score districting plans back to 2000.
In order to determine locality splitting, we also need a block equivalency file of the localities. When the localities are counties, this is easy to generate.
df_county = pd.DataFrame(PA_block_eq_df['GEOID10'])
df_county['county_fips'] = df_county['GEOID10'].astype(str).apply(lambda x: x[2:5])
df_county.head()
GEOID10 | county_fips | |
---|---|---|
0 | 420350307003003 | 035 |
1 | 420350302001056 | 035 |
2 | 420350301001322 | 035 |
3 | 420350301002207 | 035 |
4 | 420350301001013 | 035 |
Once we merge up the block equivalency files, we can use a function from metrics.py to calculate a whole ensemble of locality splitting metrics for a plan. Remember that we need to have the populations of the census blocks in a column labeled "pop."
input_df = pd.merge(PA_block_eq_df, df_county, on='GEOID10')
splitting_metrics = metrics.calculate_all_metrics(input_df, 'cd_2018', lclty_str='county_fips')
splitting_metrics
{'plan': 'cd_2018',
'splits_all': 13,
'splits_pop': 13,
'intersections_all': 17,
'intersections_pop': 17,
'split_pairs': 0.35155708843835665,
'conditional_entropy': 0.4732218666363808,
'sqrt_entropy': 1.2259489228698355,
'effective_splits': 16.854108898754916,
'split_pairs_sym': 0.8315438136166731,
'conditional_entropy_sym': 1.9181791252873452,
'sqrt_entropy_sym': 3.095251349839012,
'effective_splits_sym': 1370.9984050936714}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for locality_splitting-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15f963a2b1e9f9961021868834b702ed95cb43c44d663ef857d9c6c60442b554 |
|
MD5 | 97968121c81dd316a7ac2de3789087dd |
|
BLAKE2b-256 | c038fa0584f83de29f77d6be0e5ed9044f3f798b52e16626c331f0272735c070 |