Infer Gender from Indian Names
Project description
naampy: Infer Sociodemographic Characteristics from Indian Names
The ability to programmatically and reliably infer the social attributes of a person from their name can be useful for a broad set of tasks, from estimating bias in coverage of women in the media to estimating bias in lending against certain social groups. But unlike the American Census Bureau, which produces a list of last names and first names, which can (and are) used to infer the gender, race, ethnicity, etc., from names, the Indian government produces no such commensurate datasets. Hence inferring the relationship between gender, ethnicity, language group, etc., and names has generally been done with small datasets constructed in an ad-hoc manner.
We fill this yawning gap. Using data from the Indian Electoral Rolls (parsed data here), we estimate the proportion female, male, and [third sex]{.title-ref} (see here) for a particular [first name, year, and state.]{.title-ref}
Please also check out pranaam that uses land record data from Bihar to infer religion based on the name. The package uses indicate to transliterate Hindi to English.
Try it Online
Check out our interactive Streamlit App to test naampy with your own names!
Features
- 🚀 Easy to use: Simple API with just two main functions
- 📊 Data-driven: Based on millions of names from Indian Electoral Rolls
- 🎯 Accurate: Provides confidence scores with predictions
- 🗺️ State-specific: Get region-specific predictions for better accuracy
- 🤖 ML-powered: Neural network fallback for names not in database
- 📈 Comprehensive: Covers 31 states and union territories
Installation
Requirements
- Python 3.11
- pip or uv package manager
Install from PyPI
We strongly recommend installing naampy inside a Python virtual environment (see venv documentation):
pip install naampy
Or if you're using uv:
uv pip install naampy
Install from Source
To install the latest development version:
git clone https://github.com/appeler/naampy.git
cd naampy
pip install -e .
Quick Start
Basic Usage
import pandas as pd
from naampy import in_rolls_fn_gender, predict_fn_gender
# Create a DataFrame with names
names_df = pd.DataFrame({'name': ['Priyanka', 'Rahul', 'Anjali']})
# Get gender predictions from electoral roll data
result = in_rolls_fn_gender(names_df, 'name')
print(result[['name', 'prop_female', 'prop_male']])
Using the ML Model
For names not in the electoral roll database:
# Use the neural network model for predictions
names = ['Aadhya', 'Reyansh', 'Kiara']
predictions = predict_fn_gender(names)
print(predictions)
Detailed Usage Examples
Electoral Roll Data
import pandas as pd
from naampy import in_rolls_fn_gender
# Sample data
names = [{'name': 'gaurav'}, {'name': 'yasmin'}, {'name': 'deepti'}]
df = pd.DataFrame(names)
result = in_rolls_fn_gender(df, 'name')
print(result[['name', 'n_male', 'n_female', 'prop_female', 'prop_male']])
Output:
name n_male n_female prop_female prop_male
0 gaurav 25625.0 47.0 0.001831 0.998169
1 yasmin 58.0 6079.0 0.990549 0.009451
2 deepti 35.0 5784.0 0.993985 0.006015
Machine Learning Predictions
from naampy import predict_fn_gender
# Names not in electoral roll database
names = ["nabha", "hrithik", "kiara", "reyansh"]
predictions = predict_fn_gender(names)
print(predictions)
Output:
name pred_gender pred_prob
0 nabha female 0.755028
1 hrithik male 0.922181
2 kiara female 0.614125
3 reyansh male 0.891234
How it Works
When you first run in_rolls_fn_gender, it downloads data from Harvard Dataverse to a local cache folder. Subsequent runs use the cached data for faster performance.
The package provides two complementary approaches:
- Electoral Roll Data: Statistical data from millions of Indian voters
- Machine Learning Model: Neural network trained on name patterns
For names not found in the electoral roll database, the package automatically falls back to the ML model.
Documentation
For comprehensive documentation, examples, and API reference, visit: https://appeler.github.io/naampy/
Authors
Suriyan Laohaprapanon, Gaurav Sood, and Rajashekar Chintalapati
Related Projects
- appeler/pranaam — Predict religion based on names
- appeler/outkast — Map last names to caste categories
- appeler/parsernaam — AI-powered name parsing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file naampy-0.7.0.tar.gz.
File metadata
- Download URL: naampy-0.7.0.tar.gz
- Upload date:
- Size: 2.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01a9c1e1c41264c082dad5451ea05f3da094d8daf3913ad608d4735578033a3b
|
|
| MD5 |
c01aa58aecefcbadbf43a21c8fef12bd
|
|
| BLAKE2b-256 |
b5bdd7e286ebc89d9433c7301df1179581aab971245893310967783d6b743e37
|
Provenance
The following attestation bundles were made for naampy-0.7.0.tar.gz:
Publisher:
python-publish.yml on appeler/naampy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
naampy-0.7.0.tar.gz -
Subject digest:
01a9c1e1c41264c082dad5451ea05f3da094d8daf3913ad608d4735578033a3b - Sigstore transparency entry: 734138253
- Sigstore integration time:
-
Permalink:
appeler/naampy@7373c5d7f98321c7bc19f396669d1b37bb7fb29c -
Branch / Tag:
refs/heads/master - Owner: https://github.com/appeler
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7373c5d7f98321c7bc19f396669d1b37bb7fb29c -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file naampy-0.7.0-py3-none-any.whl.
File metadata
- Download URL: naampy-0.7.0-py3-none-any.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f367e130edfbc2827155d7ec722451f4fbd6e808712d8d67ad991e4b41ce1b6
|
|
| MD5 |
f9ff3a8c99309b6f0961b1fa7139086a
|
|
| BLAKE2b-256 |
cb98b4672aa25ff2001fa90646dcef740c3f4d703283c7573c4c9f072c9a9db9
|
Provenance
The following attestation bundles were made for naampy-0.7.0-py3-none-any.whl:
Publisher:
python-publish.yml on appeler/naampy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
naampy-0.7.0-py3-none-any.whl -
Subject digest:
6f367e130edfbc2827155d7ec722451f4fbd6e808712d8d67ad991e4b41ce1b6 - Sigstore transparency entry: 734138255
- Sigstore integration time:
-
Permalink:
appeler/naampy@7373c5d7f98321c7bc19f396669d1b37bb7fb29c -
Branch / Tag:
refs/heads/master - Owner: https://github.com/appeler
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7373c5d7f98321c7bc19f396669d1b37bb7fb29c -
Trigger Event:
workflow_dispatch
-
Statement type: