A quick check for racial bias using zipcode-level Census data
Project description
zipbiaschecker
One challenge of assessing algorithmic racial bias is sometimes that the data are missing (not collected as part of sign-up forms, for example) or unavailable for privacy reasons. In these cases, zipcode-level bias is an indirect measure. We can go one step further by analyzing Census data that contain racial demographic data by zip code. This package helps run this indirect check by looking at the correlation between the algorithmic output and the percentage of Black, Hispanic, and Indigenous people in that zip code.
Installation
This package can be installed using the command below:
pip install zipbiaschecker
Example
In this example, the data is taken from the Illinois Department of Public Health COVID statistics as of 7/15/20. We will examine the correlation between the positive rate of testing by zip code vs. the demographics of the zip code to check the disparate impact of COVID on racial minorities.
import pandas as pd
from zipbiaschecker import zipbiaschecker as zbc
df = pd.read_csv('zipbiaschecker/data/example/2020_07_15_illinois_covid_data.csv')
df['positive_rate'] = df['Positive Cases'] / df['Tested']
print(df.shape)
df.head()
(646, 4)
Zip | Tested | Positive Cases | positive_rate | |
---|---|---|---|---|
0 | 60002 | 1925 | 130 | 0.067532 |
1 | 60004 | 9441 | 406 | 0.043004 |
2 | 60005 | 4771 | 255 | 0.053448 |
3 | 60007 | 4191 | 383 | 0.091386 |
4 | 60008 | 4672 | 380 | 0.081336 |
To interpret the cell below, we see that the rate of positive cases has a positive correlation of about .278 with the proportion of Black people in the zip code, .585 with the proportion of Hispanic people in the zip code, and .108 with the proportion of Indigenous people in the zip code.
zip_bias_checker = zbc.ZipBiasChecker()
zip_bias_checker.check_bias(df, zip_col_name='Zip', target_col_name='positive_rate')
1 row(s) could not be matched out of 646
percent_black 0.277773
percent_hispanic 0.585238
percent_indigenous 0.107945
Name: positive_rate, dtype: float64
Documentation notebook for process to generate reference data
In the notebooks
folder, the process to map zip codes to demographic data is documented in a Jupyter notebook. To run the notebook, clone this repository to obtain the data used.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for zipbiaschecker-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe478d09e101dff94fe4d013a5c9383c90e1cb0f9a69d4023e54ac2e8d097ceb |
|
MD5 | 89d4e11bbc81038a149519d755d9fe48 |
|
BLAKE2b-256 | 51a5556dd3470306e1d02e187c26765207c894f71e0caae9120959025e36aed7 |