Predict race from name and location
Project description
pyethnicity:
What is it?
pyethnicity is a Python package to predict race from name and location and sex from first name. To the best of the author's knowledge, it outperforms all existing open-source models. It does this by training a Bidirectional LSTM on the largest, most comprehensive dataset of name and self-reported race thus far. It uses voter registration data from all 50 states. Additionally, it incorporates location features and improved versions of Bayesian Improved Surname Geocoding and Bayesian Improved Firstname Surname Geocoding to form an ensemble model that achieves up to 36.8% higher F1 scores than the next-best performing model. Finally, it provides CFPB-compliant and up-to-date versions of BISG and BIFSG.
Usage:
Please see https://pyethnicity.readthedocs.io/en/latest/ for full documentation.
Installing
The easiest way is to install pyethnicity is from PyPI using pip:
pip install pyethnicity
Running
Pyethnicity exposes several functions. It supports block group, tract, and zip code level features. Each function takes in a scalar or array-like of inputs and returns a polars DataFrame of the input and the predictions.
import pyethnicity
zcta = 27106
tract = 72153750502
first_name = "cangyuan"
last_name = "luo"
pyethnicity.bisg(last_name, zcta=zcta)
pyethnicity.bifsg(first_name, last_name, zcta=zcta, tract=tract)
pyethnicity.predict_race_fl(first_name, last_name)
pyethnicity.predict_race_flg(first_name, last_name, tract=tract)
pyethnicity.predict_race(first_name, last_name, zcta=zcta)
Performance
pyethnicity
rethnicity
ethnicolr
Please see the correpsonding paper "Can We Trust Race Prediction?" for more details.
TODO:
- Re-train model to support Native American and Multiracial.
This package is still in active development. Please report any issues!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyethnicity-0.0.26.tar.gz
.
File metadata
- Download URL: pyethnicity-0.0.26.tar.gz
- Upload date:
- Size: 38.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2ce36e1367e3246c3d61b5070d2f9341a39a2230eaf5d69934e245a39a7ab08 |
|
MD5 | 85a9717e5ba01ff900834034d914fc8e |
|
BLAKE2b-256 | a1b1195c064dea50b327d9f6f0885d0c0ca5b2827735039fc016b849dd08d32f |
File details
Details for the file pyethnicity-0.0.26-py3-none-any.whl
.
File metadata
- Download URL: pyethnicity-0.0.26-py3-none-any.whl
- Upload date:
- Size: 38.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25ef5ee2778912499419003fbdfbae4aa6eb64247626463200932fa1b43424ce |
|
MD5 | a3e0665458250c6384e84f6067a09ee3 |
|
BLAKE2b-256 | 11e8672e51ad38a1f553f4031077a1c50b39dd2aabc2d434acaa6501abcd11db |