Skip to main content

Name parser

Project description

Parsarnaam: Predict First and Last Name

https://github.com/appeler/parsarnaam/actions/workflows/python-package.yml/badge.svg https://img.shields.io/pypi/v/parsarnaam.svg https://static.pepy.tech/badge/parsarnaam

Most common name parsers use crude pattern matching and the sequence of strings, e.g., the last word is the last name, to parse names. This approach is limited and fragile, especially for Indian names. We take a machine-learning approach to the problem. Using the large voter registration data in India and US, we build machine-learning-based name parsers that predict whether the string is a first or last name.

For Indian electoral rolls, we assume the last name is the word in the name that is shared by multiple family members. (We table the expansion to include compound last names—extremely rare in India—till the next iteration.)

Gradio App.

Parsarnaam on HF

Installation

pip install parsarnaam

General API

The general API is as follows:

# Import the library
from parsarnaam.parsarnaam import ParseNames

positional arguments:
  df                 dataframe with Names to parse (with column name 'name')

# example
df = pd.DataFrame({'name': ['Jan Petersen', 'Piet', 'Janssen']})
           name                                                                                                                       parsed_name
0  Jan Petersen  [{'name': 'Jan', 'type': 'first', 'prob': 0.6769440174102783}, {'name': 'Petersen', 'type': 'last', 'prob': 0.5342262387275696}]
1          Piet                                                                   [{'name': 'Piet', 'type': 'first', 'prob': 0.5381495952606201}]
2       Janssen                                                                [{'name': 'Janssen', 'type': 'first', 'prob': 0.5929554104804993}]

Data

The model is trained on names from the Florida Voter Registration Data from early 2022. The data are available on the Harvard Dataverse

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributing

Contributions are welcome. Please open an issue if you find a bug or have a feature request.

License

The package is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsernaam-0.0.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

parsernaam-0.0.1-py2.py3-none-any.whl (7.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file parsernaam-0.0.1.tar.gz.

File metadata

  • Download URL: parsernaam-0.0.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for parsernaam-0.0.1.tar.gz
Algorithm Hash digest
SHA256 069091d157755c2df525f85675f2e577a0b5bb525d642feea65afc4f12da20e7
MD5 f69bcf98e74b25caaeb004989112352c
BLAKE2b-256 380072fa71456bcd495a495b11ae3c5e0a8d30cc5c4371fdc757c06e3d3249c4

See more details on using hashes here.

File details

Details for the file parsernaam-0.0.1-py2.py3-none-any.whl.

File metadata

  • Download URL: parsernaam-0.0.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for parsernaam-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8bc62024f2ae3475f7efb981f547fb1597f6bad75f503e465a1d921a49c7a8a9
MD5 f88c36f4d8fc50202714d0a4c2f72b0c
BLAKE2b-256 b06018356365e2bc90445c0ea948d2f472349b96bdc0ce855ed286f6c8965e44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page