Skip to main content

Name parser

Project description

Parsarnaam: Predict First and Last Name

https://github.com/appeler/parsarnaam/actions/workflows/python-package.yml/badge.svg https://img.shields.io/pypi/v/parsarnaam.svg https://static.pepy.tech/badge/parsarnaam

Most common name parsers use crude pattern matching and the sequence of strings, e.g., the last word is the last name, to parse names. This approach is limited and fragile, especially for Indian names. We take a machine-learning approach to the problem. Using the large voter registration data in India and US, we build machine-learning-based name parsers that predict whether the string is a first or last name.

For Indian electoral rolls, we assume the last name is the word in the name that is shared by multiple family members. (We table the expansion to include compound last names—extremely rare in India—till the next iteration.)

Gradio App.

Parsarnaam on HF

Installation

pip install parsarnaam

General API

The general API is as follows:

# Import the library
from parsarnaam.parsarnaam import ParseNames

positional arguments:
  df                 dataframe with Names to parse (with column name 'name')

# example
df = pd.DataFrame({'name': ['Jan Petersen', 'Piet', 'Janssen']})
           name                                                                                                                       parsed_name
0  Jan Petersen  [{'name': 'Jan', 'type': 'first', 'prob': 0.6769440174102783}, {'name': 'Petersen', 'type': 'last', 'prob': 0.5342262387275696}]
1          Piet                                                                   [{'name': 'Piet', 'type': 'first', 'prob': 0.5381495952606201}]
2       Janssen                                                                [{'name': 'Janssen', 'type': 'first', 'prob': 0.5929554104804993}]

Data

The model is trained on names from the Florida Voter Registration Data from early 2022. The data are available on the Harvard Dataverse

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributing

Contributions are welcome. Please open an issue if you find a bug or have a feature request.

License

The package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsernaam-0.0.2.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

parsernaam-0.0.2-py2.py3-none-any.whl (4.1 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file parsernaam-0.0.2.tar.gz.

File metadata

  • Download URL: parsernaam-0.0.2.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for parsernaam-0.0.2.tar.gz
Algorithm Hash digest
SHA256 aeb6ea4e1dc49a3ed6ca527bafdf5d0f01ad8a746154a61f3d61f402e49fb544
MD5 26f4755a56ae05f91e5c76a0d47e7c9b
BLAKE2b-256 4fb7460b4d927886035335b9adba14d5c98cbdf85ecacb3ac9fc3ef9cbb5138a

See more details on using hashes here.

File details

Details for the file parsernaam-0.0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: parsernaam-0.0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for parsernaam-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 93b1653e4ce5dc4b8ac8d6eb80a31c45857acb8b4e3696128c293923ec0a260d
MD5 a6ac6728fff80ffba609f1074529ea5e
BLAKE2b-256 bacd51b794c6c0415bb95d72707bff10d698893911dee7c579ddcf0b05fc9e85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page