Skip to main content

Predict race from name and location

Project description

pyethnicity:

PyPI version PyPI - Downloads License: MIT Tests pre-commit Checked with mypy Code style: black Imports: isort

What is it?

pyethnicity is a Python package to predict race from name and location and sex from first name. To the best of the author's knowledge, it outperforms all existing open-source models. It does this by training a Bidirectional LSTM on the largest, most comprehensive dataset of name and self-reported race thus far. It uses voter registration data from all 50 states. Additionally, it incorporates location features and improved versions of Bayesian Improved Surname Geocoding and Bayesian Improved Firstname Surname Geocoding to form an ensemble model that achieves up to 36.8% higher F1 scores than the next-best performing model. Finally, it provides CFPB-compliant and up-to-date versions of BISG and BIFSG.

Usage:

Please see https://pyethnicity.readthedocs.io/en/latest/ for full documentation.

Installing

The easiest way is to install pyethnicity is from PyPI using pip:

pip install pyethnicity

Running

Pyethnicity exposes several functions. It supports block group, tract, and zip code level features. Each function takes in a scalar or array-like of inputs and returns a polars DataFrame of the input and the predictions.

import pyethnicity

zcta = 27106
tract = 72153750502
first_name = "cangyuan"
last_name = "luo"

pyethnicity.bisg(last_name, zcta=zcta)
pyethnicity.bifsg(first_name, last_name, zcta=zcta, tract=tract)
pyethnicity.predict_race_fl(first_name, last_name)
pyethnicity.predict_race_flg(first_name, last_name, tract=tract)
pyethnicity.predict_race(first_name, last_name, zcta=zcta)

Performance

pyethnicity

rethnicity

ethnicolr

Please see the correpsonding paper "Can We Trust Race Prediction?" for more details.

TODO:

  • Re-train model to support Native American and Multiracial.

This package is still in active development. Please report any issues!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyethnicity-0.0.27.tar.gz (38.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyethnicity-0.0.27-py3-none-any.whl (38.8 MB view details)

Uploaded Python 3

File details

Details for the file pyethnicity-0.0.27.tar.gz.

File metadata

  • Download URL: pyethnicity-0.0.27.tar.gz
  • Upload date:
  • Size: 38.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for pyethnicity-0.0.27.tar.gz
Algorithm Hash digest
SHA256 349178001f90480a9be78819833ee34942f6383af3444d5bbb5736199e7f858f
MD5 685a41524d669c9f41730e514c9e59bc
BLAKE2b-256 7832d97cd3362b7fa4550e679943138b049cdbd18a29f67b3115fcbb930dfedc

See more details on using hashes here.

File details

Details for the file pyethnicity-0.0.27-py3-none-any.whl.

File metadata

  • Download URL: pyethnicity-0.0.27-py3-none-any.whl
  • Upload date:
  • Size: 38.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for pyethnicity-0.0.27-py3-none-any.whl
Algorithm Hash digest
SHA256 f0f8a495c9cfa71facc9a1eb602c6d591ae9c02d038362396ef8cf2d36a0bf42
MD5 1e5c1a9c307fdb8237bebdc1c4f48760
BLAKE2b-256 4bb997321a68f6db4026dad998bd5c5799eb96a893c58644940f88bf305ea7b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page