Skip to main content

Predict religion and caste based on name

Project description

https://github.com/appeler/pranaam/workflows/test/badge.svg https://ci.appveyor.com/api/projects/status/2ejr4mw900lvm8q5?svg=true https://img.shields.io/pypi/v/pranaam.svg Documentation Status

Pranaam uses the Bihar Land Records data, plot-level land records (N= 41.87 million plots or 12.13 individuals/accounts across 35,626 villages), to build machine learning models that predict religion and caste from the name. Our final dataset has around 4M unique records. To learn how to transform the data and the models underlying the package, check the notebooks.

The first function we are releasing with the package is pred_rel, which predicts religion based on the name (currently only muslim or not). (For context, nearly 95% of India’s population are Hindu or Muslim, with Sikhs, Buddhists, Christians, and other groups making up the rest.) The OOS accuracy assessed on unseen names is nearly 98% for both Hindi and English models.

Our training data is Hindi. To build models that classify names provided in English, we used the indicate package to transliterate our training data to English.

We are releasing this software in the hope that it enables activists and researchers.

  1. Highlight biases,

  2. Fight biases, and

  3. Prevent biases (regress out some of these biases in models built on natural language corpus with person names).

Install

We strongly recommend installing pranaam inside a Python virtual environment. (see venv documentation)

pip install pranaam

General API

  1. pranaam.pred_rel takes a list of Hindi/English names and predicts whether the person is Muslim or not.

Examples

By using names in English

from pranaam import pranaam
names = ["Shah Rukh Khan", "Amitabh Bachchan"]
result = pranaam.pred_rel(names)
print(result)

output -

              name  pred_label  pred_prob_muslim
0    Shah Rukh Khan      muslim              73.0
1  Amitabh Bachchan  not-muslim              27.0

By using names in Hindi

from pranaam import pranaam
names = ["शाहरुख खान", "अमिताभ बच्चन"]
result = pranaam.pred_rel(names, lang="hin")
print(result)

output -

          name  pred_label  pred_prob_muslim
0    शाहरुख खान      muslim              73.0
1  अमिताभ बच्चन  not-muslim              27.0

Functions

We expose one function, which takes Hindi/English text (name) and predicts religion and caste.

  • pranaam.pred_rel(input)

    • What it does:

      • predicts religion based on hindi/english text (name)

    • Output

      • Returns pandas with name and label (muslim/not-muslim)

Authors

Rajashekar Chintalapati, Aaditya Dar, and Gaurav Sood

Contributor Code of Conduct

The project welcomes contributions from everyone! It depends on it. To maintain this welcoming atmosphere and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pranaam-0.0.2.tar.gz (88.5 kB view details)

Uploaded Source

Built Distribution

pranaam-0.0.2-py2.py3-none-any.whl (91.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pranaam-0.0.2.tar.gz.

File metadata

  • Download URL: pranaam-0.0.2.tar.gz
  • Upload date:
  • Size: 88.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.3

File hashes

Hashes for pranaam-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d013cadbbb1a9cdfa0819e21925620d2815fee6cc1ef7a4abfa306e0bf5cc119
MD5 bebf2281630ead0a275947e2923c2505
BLAKE2b-256 142450b6ff63529b758ec9574172ec7a031734f9de222601a866ac8b4b81acac

See more details on using hashes here.

File details

Details for the file pranaam-0.0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: pranaam-0.0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 91.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.3

File hashes

Hashes for pranaam-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 15807dfba0e7e43d2a5dc18be942a50a8677901397cb54121eb29228a8270a7b
MD5 accb0baceef1e191525a1959c9bbeb3e
BLAKE2b-256 7bc32d01c435d147fe33a7b94909764f984bd097d5202f73f2dcb70633d96de7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page