ML-assisted name parser for Indian and international names

These details have been verified by PyPI

Project links

Owner

appeler

GitHub Statistics

Maintainers

rajashekar

These details have not been verified by PyPI

Project description

Parsernaam: ML-Assisted Name Parser

Most common name parsers use crude pattern matching and the sequence of strings, e.g., the last word is the last name, to parse names. This approach is limited and fragile, especially for Indian names. We take a machine-learning approach to the problem. Using the large voter registration data in India and the US, we build machine-learning-based name parsers that predict whether the string is a first or last name.

For Indian electoral rolls, we assume the last name is the word in the name that is shared by multiple family members. (We table the expansion to include compound last names---extremely rare in India---till the next iteration.)

Gradio App.

parsernaam on HF

Installation

pip install parsernaam

Usage

Python API

import pandas as pd
from parsernaam.parse import ParseNames

# Create DataFrame with names to parse
df = pd.DataFrame({'name': ['Jan', 'Nicholas Turner', 'Petersen', 'Nichols Richard', 'Piet',
                           'John Smith', 'Janssen', 'Kim Yeon']})

# Parse names using ML models
results = ParseNames.parse(df)
print(results.to_markdown())

Output:

|    | name            | parsed_name                                                                   |
|---:|:----------------|:------------------------------------------------------------------------------|
|  0 | Jan             | {'name': 'Jan', 'type': 'first', 'prob': 0.677}                            |
|  1 | Nicholas Turner | {'name': 'Nicholas Turner', 'type': 'first_last', 'prob': 0.999}           |
|  2 | Petersen        | {'name': 'Petersen', 'type': 'last', 'prob': 0.534}                        |
|  3 | Nichols Richard | {'name': 'Nichols Richard', 'type': 'last_first', 'prob': 0.999}           |
|  4 | Piet            | {'name': 'Piet', 'type': 'first', 'prob': 0.538}                           |
|  5 | John Smith      | {'name': 'John Smith', 'type': 'first_last', 'prob': 0.997}                |
|  6 | Janssen         | {'name': 'Janssen', 'type': 'first', 'prob': 0.593}                        |
|  7 | Kim Yeon        | {'name': 'Kim Yeon', 'type': 'last_first', 'prob': 0.999}                  |

Command Line Interface

parse_names input.csv -o output.csv -n name_column

Features

Machine Learning Based: Uses LSTM neural networks trained on voter registration data
Multi-language Support: Handles Indian, Western, and other international name patterns
High Accuracy: Confidence scores provided for each prediction
Performance Optimized: Model caching and batch processing support
Robust Error Handling: Handles edge cases like empty names, special characters, etc.

Data

The model is trained on names from the Florida Voter Registration Data from early 2022. The data are available on the Harvard Dataverse

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributing

Contributions are welcome. Please open an issue if you find a bug or have a feature request.

🔗 Adjacent Repositories

appeler/naamkaran — generative model for names
appeler/ethnicolr2 — Ethnicolr implementation with new models in pytorch
appeler/namesexdata — Data on international first names and sex of people with that name
appeler/pranaam — pranaam: predict religion based on name
appeler/graphic_names — Infer the gender of a person with a particular first name using Google image search and Clarifai

License

The package is released under the MIT License.

Project details

These details have been verified by PyPI

Project links

Owner

appeler

GitHub Statistics

Maintainers

rajashekar

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Nov 27, 2025

0.1.1

Sep 3, 2025

0.1.0

Sep 3, 2025

0.0.4

Oct 11, 2023

0.0.3

Sep 13, 2023

0.0.2

Sep 12, 2023

0.0.1

Sep 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsernaam-0.2.0.tar.gz (8.1 MB view details)

Uploaded Nov 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parsernaam-0.2.0-py3-none-any.whl (8.1 MB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file parsernaam-0.2.0.tar.gz.

File metadata

Download URL: parsernaam-0.2.0.tar.gz
Upload date: Nov 27, 2025
Size: 8.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parsernaam-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`77d57f97cbb63714fd19a269781c1bd72475e4db5e818e7088c0c73591dea041`
MD5	`0d48a80f3ba96178550525063fd7f941`
BLAKE2b-256	`b4829d72a2b2f23bd0b6b55fe2ada1a15b0652392e03de14bb5f311341ae87b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsernaam-0.2.0.tar.gz:

Publisher: python-publish.yml on appeler/parsernaam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: parsernaam-0.2.0.tar.gz
- Subject digest: 77d57f97cbb63714fd19a269781c1bd72475e4db5e818e7088c0c73591dea041
- Sigstore transparency entry: 729635269
- Sigstore integration time: Nov 27, 2025
Source repository:
- Permalink: appeler/parsernaam@7be61058f3deff28e39146c8552b4e5030a87448
- Branch / Tag: refs/heads/main
- Owner: https://github.com/appeler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@7be61058f3deff28e39146c8552b4e5030a87448
- Trigger Event: workflow_dispatch

File details

Details for the file parsernaam-0.2.0-py3-none-any.whl.

File metadata

Download URL: parsernaam-0.2.0-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 8.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parsernaam-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b5dd454e654d230a991d20e9450deeac980c3a04b7a46258520a5537b9357be`
MD5	`a40559f0c014f4681fdaf8b4a1589ea0`
BLAKE2b-256	`6eee3a4e0d2f8e3c7c611840718a2cf24eb68157a048b286afb94816a30b026b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsernaam-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on appeler/parsernaam

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: parsernaam-0.2.0-py3-none-any.whl
- Subject digest: 8b5dd454e654d230a991d20e9450deeac980c3a04b7a46258520a5537b9357be
- Sigstore transparency entry: 729635270
- Sigstore integration time: Nov 27, 2025
Source repository:
- Permalink: appeler/parsernaam@7be61058f3deff28e39146c8552b4e5030a87448
- Branch / Tag: refs/heads/main
- Owner: https://github.com/appeler
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@7be61058f3deff28e39146c8552b4e5030a87448
- Trigger Event: workflow_dispatch

parsernaam 0.2.0

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Parsernaam: ML-Assisted Name Parser

Gradio App.

Installation

Usage

Python API

Command Line Interface

Features

Data

Authors

Contributing

🔗 Adjacent Repositories

License

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance