Predict religion and caste based on name
Project description
Pranaam uses the Bihar Land Records data, plot-level land records (N= 41.87 million plots or 12.13 individuals/accounts across 35,626 villages), to build machine learning models that predict religion and caste from the name. Our final dataset has around 4M unique records. To learn how to transform the data and the models underlying the package, check the notebooks.
The first function we are releasing with the package is pred_rel, which predicts religion based on the name (currently only muslim or not). (For context, nearly 95% of India’s population are Hindu or Muslim, with Sikhs, Buddhists, Christians, and other groups making up the rest.) The OOS accuracy assessed on unseen names is nearly 98% for both Hindi and English models.
Our training data is Hindi. To build models that classify names provided in English, we used the indicate package to transliterate our training data to English.
We are releasing this software in the hope that it enables activists and researchers.
Highlight biases,
Fight biases, and
Prevent biases (regress out some of these biases in models built on natural language corpus with person names).
Install
We strongly recommend installing pranaam inside a Python virtual environment. (see venv documentation)
pip install pranaam
General API
pranaam.pred_rel takes a list of Hindi/English names and predicts whether the person is Muslim or not.
Examples
By using names in English
from pranaam import pranaam names = ["Shah Rukh Khan", "Amitabh Bachchan"] result = pranaam.pred_rel(names) print(result)
output -
name pred_label pred_prob_muslim 0 Shah Rukh Khan muslim 73.0 1 Amitabh Bachchan not-muslim 27.0
By using names in Hindi
from pranaam import pranaam names = ["शाहरुख खान", "अमिताभ बच्चन"] result = pranaam.pred_rel(names, lang="hin") print(result)
output -
name pred_label pred_prob_muslim 0 शाहरुख खान muslim 73.0 1 अमिताभ बच्चन not-muslim 27.0
Functions
We expose one function, which takes Hindi/English text (name) and predicts religion and caste.
pranaam.pred_rel(input)
What it does:
predicts religion based on hindi/english text (name)
Output
Returns pandas with name and label (muslim/not-muslim)
Contributor Code of Conduct
The project welcomes contributions from everyone! It depends on it. To maintain this welcoming atmosphere and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.
License
The package is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pranaam-0.0.2.tar.gz
.
File metadata
- Download URL: pranaam-0.0.2.tar.gz
- Upload date:
- Size: 88.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d013cadbbb1a9cdfa0819e21925620d2815fee6cc1ef7a4abfa306e0bf5cc119 |
|
MD5 | bebf2281630ead0a275947e2923c2505 |
|
BLAKE2b-256 | 142450b6ff63529b758ec9574172ec7a031734f9de222601a866ac8b4b81acac |
File details
Details for the file pranaam-0.0.2-py2.py3-none-any.whl
.
File metadata
- Download URL: pranaam-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 91.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15807dfba0e7e43d2a5dc18be942a50a8677901397cb54121eb29228a8270a7b |
|
MD5 | accb0baceef1e191525a1959c9bbeb3e |
|
BLAKE2b-256 | 7bc32d01c435d147fe33a7b94909764f984bd097d5202f73f2dcb70633d96de7 |