Skip to main content

Predict race from name and location

Project description

pyethnicity:

PyPI version PyPI - Downloads License: MIT Tests pre-commit Checked with mypy Code style: black Imports: isort

What is it?

pyethnicity is a Python package to predict race from name and location and sex from first name. To the best of the author's knowledge, it outperforms all existing open-source models. It does this by training a Bidirectional LSTM on the largest, most comprehensive dataset of name and self-reported race thus far. It uses voter registration data from all 50 states. Additionally, it incorporates location features and improved versions of Bayesian Improved Surname Geocoding and Bayesian Improved Firstname Surname Geocoding to form an ensemble model that achieves up to 36.8% higher F1 scores than the next-best performing model. Finally, it provides CFPB-compliant and up-to-date versions of BISG and BIFSG.

Usage:

Please see https://pyethnicity.readthedocs.io/en/latest/ for full documentation.

Installing

The easiest way is to install pyethnicity is from PyPI using pip:

pip install pyethnicity

Running

Pyethnicity exposes several functions. It supports block group, tract, and zip code level features. Each function takes in a scalar or array-like of inputs and returns a polars DataFrame of the input and the predictions.

import pyethnicity

zcta = 27106
tract = 72153750502
first_name = "cangyuan"
last_name = "luo"

pyethnicity.bisg(last_name, zcta=zcta)
pyethnicity.bifsg(first_name, last_name, zcta=zcta, tract=tract)
pyethnicity.predict_race_fl(first_name, last_name)
pyethnicity.predict_race_flg(first_name, last_name, tract=tract)
pyethnicity.predict_race(first_name, last_name, zcta=zcta)

Performance

pyethnicity

rethnicity

ethnicolr

Please see the correpsonding paper "Can We Trust Race Prediction?" for more details.

TODO:

  • Re-train model to support Native American and Multiracial.

This package is still in active development. Please report any issues!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyethnicity-0.0.26.tar.gz (38.8 MB view details)

Uploaded Source

Built Distribution

pyethnicity-0.0.26-py3-none-any.whl (38.8 MB view details)

Uploaded Python 3

File details

Details for the file pyethnicity-0.0.26.tar.gz.

File metadata

  • Download URL: pyethnicity-0.0.26.tar.gz
  • Upload date:
  • Size: 38.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for pyethnicity-0.0.26.tar.gz
Algorithm Hash digest
SHA256 d2ce36e1367e3246c3d61b5070d2f9341a39a2230eaf5d69934e245a39a7ab08
MD5 85a9717e5ba01ff900834034d914fc8e
BLAKE2b-256 a1b1195c064dea50b327d9f6f0885d0c0ca5b2827735039fc016b849dd08d32f

See more details on using hashes here.

File details

Details for the file pyethnicity-0.0.26-py3-none-any.whl.

File metadata

File hashes

Hashes for pyethnicity-0.0.26-py3-none-any.whl
Algorithm Hash digest
SHA256 25ef5ee2778912499419003fbdfbae4aa6eb64247626463200932fa1b43424ce
MD5 a3e0665458250c6384e84f6067a09ee3
BLAKE2b-256 11e8672e51ad38a1f553f4031077a1c50b39dd2aabc2d434acaa6501abcd11db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page