Skip to main content

Compares a Precinct-Level Election Shapefile with expected election results and geometries.

Project description

Verification of Election Shapefiles

This verification script generates a report that compares a Precinct-Level Election Shapefile with expected election results and geometries.

Sources

Usage

Inputs and Outputs

Input:

  • state_prec_gdf (GeoDataFrame) containing precinct geometries and election results.
  • state_abbreviation (str) e.g. 'MA' for Massachusetts
  • source (str) person or organization that made the 'state_prec_gdf' e.g 'VEST'
  • county_level_results_df (DataFrame) containing official county-level election results
  • office (str) office to be evaluated in vote validation e.g. 'U.S. Senate'
  • year (str) 'YYYY' indicating the year the election took place e.g. '2016'
  • d_col (str) denotes the column for democratic vote counts in each precinct
  • r_col (str) denotes the column for republican vote counts in each precinct
  • path (str) filepath to which the report should be saved (if None it won't be saved)

Output:

  • state_report (StateReport) for the state_prec_gdf
  • county_reports ((CountyReport) list) for the state_prec_gdf

Schemas

Pay close attention to the name and dtype of your columns to ensure the script works correctly. It's permissible to include more columns than those listed here (they will just be ignored by the script).

state_prec_gdf:

Column Name dtype example
d_col int 5936
r_col int 6395
geometry geometry POLYGON ((-71.99365 44.49649, -71.99262 44.496...
GEOID object '01001'

GEOID is optional for state_prec_gdf, but strongly reccomended. Learn more...

county_level_results_df:

Column Name dtype example
county object 'Essex County'
GEOID object '01001'
party object 'democrat' or 'republican'
votes int 5936

The county_level_results_df DataFrame should only contain results for the office that's passed as an input.

GEOID/County assignment for each precinct

Wikipedia link

This script compares precinct level election results from the state_prec_gdf with the expected election results from official state election data records at the state and county level. In both cases the precinct level results are aggregated up to their state and county respectively and then compared to the expected results. Likewise, the precinct geometries are aggregated up to the county level and compared with the county shapefiles from the US Census Bureau.

In order to do the comparisons detailed above, the script needs to know about the makeup of state_prec_gdf. Specifically, it needs to know the county (or equivalent) for each precinct and which columns correspond with the votes for the Democratic and Republican candidate for each precinct.

The precincts need to be assigned a county in the form of the county's 5 digit GEOID code described below:

GEOID SPEC

Elements of the GEOID column are 5 character strings. The first 2 characters are the StateFP code and the last 3 characters are the CountyFP code. e.g.

  • Massachusetts' StateFP = '25'
  • Essex County's CountyFP = '009'
  • Essex County, Massachusetts' GEODID = '25009'

If either code has fewer digits than are allocated, the string representation should be zero-padded from the left. e.g. Alaska (StateFP = 2) should be '02'.

The GEOID may be given for each precinct in state_prec_gdf file. In this case, the column must conform to the spec above and be named 'GEOID'. If the GEOID column is missing then the script will attempt to create it using the MAUP package to assign each precinct to the county which contains it. Omission of the GEOID label in the input file and failure to assign counties with MAUP (e.g. script throws an exception) will result in the report skipping county level metrics (denoted with -1 metric values).

Candidate vote counts

The script needs to know which column contains votes for the Democratic and Republican candidate(s) being reviewed by the script. They can be manually entered as arguments:

  • d_col denotes the column for Democratic vote counts in each precinct
  • r_col denotes the column for Republican vote counts in each precinct.

Without those arguments, the script will guess based on the expected number of votes for each candidate.

Election Year reccomendations

2016 Precinct-Level Election Shapefiles

The verify.verify_state_2016(...) function will call verify.verify_state(...) and automatically apply 2016 specific defaults:

  • Uses Official County Results from the 2016 Presidential Election already in this repository
  • Sets year to '2016'
  • Sets office to 'President'

Using this funciton for a 2016 Precinct-Level Election Shapefile has the benefit of standardizing 2016 reports. Moreover, it saves you the time of finding official county level results and conforming the data to the expected schema for input data.

Non-2016 Precinct-Level Election Shapefile

For non-2016 Precinct-Level Election Shapefiles, you do need to supply a schema-conforming county_level_results_df to verify.verify_state(...). I reccomend looking on the Department of State website for the state being validated. For example, I found official county results for Pennsylannia's 2018 election here

Checkout Verification Example Notebook.ipynb for examples of both cases.

Verification Report Breakdown

Quality Scores

Vote Score

Compute the ratio of votes observed in state_prec_gdf to the votes expected (based on official state election data records in county_level_results_df) for the democratic and republican candidate. Then the Vote Score is the weighted average of these ratios. Python Implementation

  • Ideally Vote Score = 1
  • A Vote Score above 1 indicates that the Input contains more recorded votes than the official state election data
  • A Vote Score below 1 indicates that the Input contains fewer recorded votes than the official state election data.

County Vote Score Dispersion

For each county, compute the square of the difference between the expected number of votes for the democratic and republican candidate. Then, County Vote Score Dispersion is the average of the square difference across all the counties in the state. Python Implementation

  • Ideally County Vote Score Dispersion = 0
  • As the County Vote Score Dispersion increases, so does the degree to which the Input differs with respect to official state election data records about the county-level results.

Area Difference Score

Compute the symmetric difference between the Input's geometries and the expected geometries for that state from the Census Bureau. Then Area Difference Score is the ratio of the symmetric difference's area to the area of the precinct shapefiles. Python Implementation

  • Ideally Area Difference Score is 0
  • As the Area Difference Score increases it indicates a greater geometric difference between the observed geometry in the Input and the expected geometry.
  • An Area Difference Score of -1 indicates an error was encountered when attempting to compute the metric. Therefore, it is the worst value possible for the Area Difference score.

Library Compatibility

Check the Input for compatibility with libraries and packages that we hope our end users will be able to apply to the map.

  • can_use_maup: (boolean) Can use MAUP, a geospatial toolkit for redistricting data.
  • can_use_gerrychain: (boolean) Can use Gerrychain which is useful for applying sensitivity testing via Markov chain Monte Carlo sampling.

Raw Data

  • n_votes_democrat_expected: (int) number of votes for the democratic candidate in MEDSL dataset
  • n_votes_republican_expected: (int) number of votes for the republican candidate in MEDSL dataset
  • n_two_party_votes_expected: (int) n_votes_republican_expected + n_votes_republican_expected
  • n_votes_democrat_observed: (int) number of votes for the democratic candidate in the Input
  • n_votes_republican_observed: (int) number of votes for the republican candidate in the Input
  • n_two_party_votes_observed: (int) n_votes_democrat_observed + n_votes_republican_observed
  • all_precincts_have_a_geometry: (int) every precinct has a valid geometry

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

op_verification-0.0.10.tar.gz (60.9 kB view details)

Uploaded Source

Built Distribution

op_verification-0.0.10-py3-none-any.whl (14.0 MB view details)

Uploaded Python 3

File details

Details for the file op_verification-0.0.10.tar.gz.

File metadata

  • Download URL: op_verification-0.0.10.tar.gz
  • Upload date:
  • Size: 60.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for op_verification-0.0.10.tar.gz
Algorithm Hash digest
SHA256 ee38cd9d0650ac4cfcb2f8ab006af7d2db5bd06d653e030284e40e88a53a3076
MD5 14ef36a9a25ae078713486cdac5bafc6
BLAKE2b-256 e66db31aed863ed2571b394d4160a5f8b17151c79131bb004bddaf6d58431a09

See more details on using hashes here.

File details

Details for the file op_verification-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: op_verification-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 14.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for op_verification-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 0bda16c66de8687fee606283b279092451aabfe2a147e16786f851f5c1ee04b0
MD5 eca737b7536c6be089c55f910d944f4d
BLAKE2b-256 251a497c3fc47e58deb63ccb913847b4d1e8d2ff020c8958896b327566d501e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page