Skip to main content

Name parser

Project description

Parsarnaam: Predict First and Last Name

Most common name parsers use crude pattern matching and the sequence of strings, e.g., the last word is the last name, to parse names. This approach is limited and fragile, especially for Indian names. We take a machine-learning approach to the problem. Using the large voter registration data in India and US, we build machine-learning-based name parsers that predict whether the string is a first or last name.

For Indian electoral rolls, we assume the last name is the word in the name that is shared by multiple family members. (We table the expansion to include compound last names—extremely rare in India—till the next iteration.)

Gradio App.

Parsarnaam on HF


pip install parsarnaam

General API

The general API is as follows:

# Import the library
from parsarnaam.parsarnaam import ParseNames

positional arguments:
  df                 dataframe with Names to parse (with column name 'name')

# example
df = pd.DataFrame({'name': ['Jan Petersen', 'Piet', 'Janssen']})
           name                                                                                                                       parsed_name
0  Jan Petersen  [{'name': 'Jan', 'type': 'first', 'prob': 0.6769440174102783}, {'name': 'Petersen', 'type': 'last', 'prob': 0.5342262387275696}]
1          Piet                                                                   [{'name': 'Piet', 'type': 'first', 'prob': 0.5381495952606201}]
2       Janssen                                                                [{'name': 'Janssen', 'type': 'first', 'prob': 0.5929554104804993}]


The model is trained on names from the Florida Voter Registration Data from early 2022. The data are available on the Harvard Dataverse


Rajashekar Chintalapati and Gaurav Sood


Contributions are welcome. Please open an issue if you find a bug or have a feature request.


The package is released under the MIT License.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsernaam-0.0.1.tar.gz (7.8 kB view hashes)

Uploaded Source

Built Distribution

parsernaam-0.0.1-py2.py3-none-any.whl (7.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page