Instate: predict the state of residence from last name

These details have been verified by PyPI

Project links

Owner

appeler

GitHub Statistics

Maintainers

rajashekar

These details have not been verified by PyPI

Project description

instate: predict spoken language and the state of residence from last name

Using the Indian electoral rolls data (2017), we provide a Python package that takes the last name of a person and gives its distribution across states. This package can also predict the spoken language of the person based on the last name.

Potential Use Cases

India has 22 official languages. To serve such a diverse language base is a challenge for businesses and surveyors. To the extent that businesses have access to the last name (and no other information) and in the absence of other data that allows us to model a person's spoken language, the distribution of last names across states is the best we have.

Dataset

Refer to lastname_langs_india.csv.tar.gz for the dataset that will be used to predict/lookup the spoken language based on the last name.

Refer to lastname_langs_india_top3.csv.tar.gz for the dataset that will be used to predict the top-3 spoken languages based on the last name. A LSTM model has been trained on this dataset to predict the top-3 spoken languages.

Refer to the notebooks for the notebooks that were used to prepare the above datasets and train the models.

Web UI

Note: Streamlit app is currently unavailable.

Installation

We strongly recommend installing instate inside a Python virtual environment (see venv documentation)

pip install instate

Examples

from instate import last_state
last_dat = pd.read_csv("last_dat.csv")
last_state_dat = last_state(last_dat, "dhingra")
print(last_state_dat)

API

instate provides 4 main functions for predicting state and language from Indian last names.

Electoral Rolls Lookup

get_state_distribution - Get P(state|lastname) from 2017 electoral rolls data

import instate

# With list of names
names = ["sharma", "patel", "singh"]
result = instate.get_state_distribution(names)
print(result[["name", "Delhi", "Gujarat", "Punjab"]].head())

# With DataFrame
import pandas as pd
df = pd.DataFrame({"lastname": ["sharma", "patel"]})
result = instate.get_state_distribution(df, "lastname")
print(result.shape)  # (2, 33) - 2 names + 31 state columns

get_state_languages - Map states to their official languages

# Map states to languages
states = ["Delhi", "Punjab", "Gujarat"]
result = instate.get_state_languages(states)
print(result[["state", "official_languages"]])

#     state official_languages
# 0   Delhi     Hindi, English
# 1  Punjab            Punjabi
# 2 Gujarat           Gujarati

Neural Network Predictions

predict_state - Predict likely states using trained GRU model

# Predict top 3 most likely states
names = ["sharma", "patel", "singh"]
result = instate.predict_state(names, top_k=3)
print(result["predicted_states"].iloc[0])
# ['Delhi', 'Uttar Pradesh', 'Bihar']

predict_language - Predict likely languages using LSTM or k-nearest neighbor

# LSTM neural network prediction (top 3)
result = instate.predict_language(names, model="lstm", top_k=3)
print(result["predicted_languages"].iloc[0])
# ['hindi', 'punjabi', 'urdu']

# K-nearest neighbor lookup (single best)
result = instate.predict_language(names, model="knn")
print(result["predicted_languages"].iloc[0])
# 'hindi'

Complete Example

import pandas as pd
import instate

# Sample data
df = pd.DataFrame({
    "person_id": [1, 2, 3],
    "lastname": ["sharma", "patel", "singh"]
})

# Get state distributions from electoral rolls
state_dist = instate.get_state_distribution(df, "lastname")
print("Electoral rolls data shape:", state_dist.shape)

# Predict states with neural network
predicted_states = instate.predict_state(df, "lastname", top_k=3)
print("Top 3 predicted states:", predicted_states["predicted_states"].iloc[0])

# Predict languages
predicted_langs = instate.predict_language(df, "lastname", model="lstm", top_k=3)
print("Top 3 predicted languages:", predicted_langs["predicted_languages"].iloc[0])

# Map states to languages
states_df = pd.DataFrame({"state": ["Delhi", "Gujarat", "Punjab"]})
lang_map = instate.get_state_languages(states_df)
print("State language mapping:")
print(lang_map[["state", "official_languages"]])

Data

The underlying data for the package can be accessed at: https://doi.org/10.7910/DVN/ZXMVTJ

Evaluation

The model has a top-3 accuracy of 85.3% on unseen names. The KNN model does quite well. See the details here. The name-to-language lookup has an accuracy of 67.9%. The name-to-language model prediction has an accuracy of 72.2%.

Authors

Atul Dhingra, Gaurav Sood and Rajashekar Chintalapati

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

🔗 Adjacent Repositories

appeler/naampy — Infer Sociodemographic Characteristics from Names Using Indian Electoral Rolls
appeler/ethnicolr2 — Ethnicolr implementation with new models in pytorch
appeler/parsernaam — AI name parsing. Predict first or last name using a DL model.
appeler/ethnicolor — Race and Ethnicity based on name using data from census, voter reg. files, etc.
appeler/ethnicolr — Predict Race and Ethnicity Based on the Sequence of Characters in a Name

Project details

These details have been verified by PyPI

Project links

Owner

appeler

GitHub Statistics

Maintainers

rajashekar

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.0

Dec 27, 2025

1.0.0

Dec 4, 2025

0.1.7

Aug 19, 2024

0.1.6

Aug 18, 2024

0.1.5

Aug 18, 2024

0.1.4

Aug 18, 2024

0.1.3

Aug 18, 2024

0.1.2

Mar 24, 2023

0.1.1

Mar 15, 2023

0.1.0

Mar 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instate-1.1.0.tar.gz (26.5 MB view details)

Uploaded Dec 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

instate-1.1.0-py3-none-any.whl (26.5 MB view details)

Uploaded Dec 27, 2025 Python 3

File details

Details for the file instate-1.1.0.tar.gz.

File metadata

Download URL: instate-1.1.0.tar.gz
Upload date: Dec 27, 2025
Size: 26.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for instate-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`925dcfce79ab3d7e54aa1f37b5357eb155bb1bc421fc22d6c4b5f96e10a23fbd`
MD5	`68673a587fc66fb0fa35be09925bf0bd`
BLAKE2b-256	`4e86a6b8b81fa803710831362b86d3a45cf0ea155d1a492cbc42b3929c7de7b9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for instate-1.1.0.tar.gz:

Publisher: python-publish.yml on appeler/instate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: instate-1.1.0.tar.gz
- Subject digest: 925dcfce79ab3d7e54aa1f37b5357eb155bb1bc421fc22d6c4b5f96e10a23fbd
- Sigstore transparency entry: 780530746
- Sigstore integration time: Dec 27, 2025
Source repository:
- Permalink: appeler/instate@33d4ee507298bac224af89e96c35468a3be91adb
- Branch / Tag: refs/heads/main
- Owner: https://github.com/appeler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@33d4ee507298bac224af89e96c35468a3be91adb
- Trigger Event: workflow_dispatch

File details

Details for the file instate-1.1.0-py3-none-any.whl.

File metadata

Download URL: instate-1.1.0-py3-none-any.whl
Upload date: Dec 27, 2025
Size: 26.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for instate-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db13ce92f66a3e26ff142f2f155b3a1e2bff5b3f9969496b8d64d5953021d009`
MD5	`6375ae1f9e4d8b6924503005fb07af71`
BLAKE2b-256	`ecb6d255d499b8bb4af28509e7deaada52ac69e06eb3e6fdf1f881f259e66042`

See more details on using hashes here.

Provenance

The following attestation bundles were made for instate-1.1.0-py3-none-any.whl:

Publisher: python-publish.yml on appeler/instate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: instate-1.1.0-py3-none-any.whl
- Subject digest: db13ce92f66a3e26ff142f2f155b3a1e2bff5b3f9969496b8d64d5953021d009
- Sigstore transparency entry: 780530748
- Sigstore integration time: Dec 27, 2025
Source repository:
- Permalink: appeler/instate@33d4ee507298bac224af89e96c35468a3be91adb
- Branch / Tag: refs/heads/main
- Owner: https://github.com/appeler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@33d4ee507298bac224af89e96c35468a3be91adb
- Trigger Event: workflow_dispatch

instate 1.1.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

instate: predict spoken language and the state of residence from last name

Potential Use Cases

Dataset

Web UI

Installation

Examples

API

Electoral Rolls Lookup

Neural Network Predictions

Complete Example

Data

Evaluation

Authors

Contributor Code of Conduct

License

🔗 Adjacent Repositories

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance