Skip to main content

A maximum-strength name parser for record linkage.

Project description

License: AGPL 3.0+ Python: 3.6+ Package hosted on PyPI Repo hosted on GitHub Builds at CircleCI Coverage at Coveralls Latest commit

nominally: a maximum-strength name parser for record linkage

🖥️ Examples

Run a quick name at the command line:

  $ nominally "Jimmy Blankinsop"
         raw: Jimmy Blankinsop
      parsed: jimmy blankinsop
        list: ['', 'jimmy', '', 'blankinsop', '', '']
      title:
      first: jimmy
      middle:
        last: blankinsop
      suffix:
    nickname:

Pull out the major parts...

>>> from nominally import parse_name
>>> parse_name("Blankinsop, Jr., Mr. James 'Jimmy'")
{'title': 'mr', 'first': 'james', 'middle': '', 'last': 'blankinsop', 'suffix': 'jr', 'nickname': 'jimmy'}

Or separate into individual parts; complete string; lists; dicts...

>>> from nominally import Name
>>> n = Name("DR. PEACHES BARTKOWICZ")
>>> n
Name({'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''})
>>> str(n)
'dr peaches bartkowicz'
>>> dict(n)
{'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''}
>>> list(n.values())
['dr', 'peaches', '', 'bartkowicz', '', '']
>>> n.first
'peaches'
>>> n.last
'bartkowicz'
>>> n.raw
'DR. PEACHES BARTKOWICZ'
>>> n.report()
{'raw': 'DR. PEACHES BARTKOWICZ', 'parsed': 'dr peaches bartkowicz', 'list': ['dr', 'peaches', '', 'bartkowicz', '', ''], 'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''}

Now a live example using Pandas: https://colab.research.google.com/gist/vaneseltine/964fc9dec60e59410b91bbcaf1fe2d11/nom_pandas.ipynb

Go from list...

# raw_names
["Graham Arthur Chapman",
 "cleese, john m",
 "Gilliam, Terrence (Terry) Vance",
 "Eric Idle",
 'Mr. Terence "Terry" Graham Parry Jones',
 "M E Palin",
 "Neil James Innes",
 "carol cleveland",
 "Adams, Douglas N"]

...to DataFrame in a couple simple notebook cells.

                                        0  title     first        middle       last  suffix  nickname
0                   Graham Arthur Chapman           graham        arthur    chapman
1                          cleese, john m             john             m     cleese
2         Gilliam, Terrence (Terry) Vance         terrence         vance    gilliam             terry
3                               Eric Idle             eric                     idle
4  Mr. Terence "Terry" Graham Parry Jones     mr   terence  graham parry      jones             terry
5                               M E Palin                m             e      palin
7                         carol cleveland            carol                cleveland
6                        Neil James Innes             neil         james      innes
8                        Adams, Douglas N          douglas             n      adams

🎓 Origins

nominally grew from—and greatly benefits from the test bank of—the python-nameparser package. The key difference is that nominally focuses relatively narrowly on lists of decently well-formed single name fields. Therefore, nominally does not support:

  • Mutability of Name
  • Easy customization of lists of name parts
  • Parsing multiple names from mingled fields
  • Most titles, profession names, and other name prefixes
  • Mononyms: raw names expected to output only a single field
  • Encoding other than UTF-8
  • Input from byte strings
  • Python 3.5 or lower

🧙‍ Author

Matt VanEseltine

matvan@umich.edu

https://git.sr.ht/~matvan

https://github.com/vaneseltine

https://twitter.com/vaneseltine

https://stackoverflow.com/users/7846185/matt-vaneseltine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nominally-0.9.6.tar.gz (23.6 kB view hashes)

Uploaded Source

Built Distribution

nominally-0.9.6-py3-none-any.whl (30.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page