A maximum-strength name parser for record linkage.
Project description
nominally: a maximum-strength name parser for record linkage
🖥️ Examples
Run a quick name at the command line:
$ nominally "Jimmy Blankinsop"
raw: Jimmy Blankinsop
cleaned: jimmy blankinsop
parsed: blankinsop, jimmy
list: ['', 'jimmy', '', 'blankinsop', '', '']
title:
first: jimmy
middle:
last: blankinsop
suffix:
nickname:
Pull out the major parts...
>>> from nominally import parse_name
>>> parse_name("Blankinsop, Jr., Mr. James 'Jimmy'")
{'title': 'mr', 'first': 'james', 'middle': '', 'last': 'blankinsop', 'suffix': 'jr', 'nickname': 'jimmy'}
Or separate into individual parts; complete string; lists; dicts...
>>> from nominally import Name
>>> n = Name("DR. PEACHES BARTKOWICZ")
>>> n
Name({'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''})
>>> str(n)
'dr peaches bartkowicz'
>>> dict(n)
{'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''}
>>> list(n.values())
['dr', 'peaches', '', 'bartkowicz', '', '']
>>> n.first
'peaches'
>>> n.last
'bartkowicz'
>>> n.raw
'DR. PEACHES BARTKOWICZ'
>>> n.report()
{'raw': 'DR. PEACHES BARTKOWICZ', 'cleaned': 'dr peaches bartkowicz', 'parsed': 'bartkowicz, dr peaches', 'list': ['dr', 'peaches', '', 'bartkowicz', '', ''], 'title': 'dr', 'first': 'peaches', 'middle': '', 'last': 'bartkowicz', 'suffix': '', 'nickname': ''}
Now a live example using Pandas: https://colab.research.google.com/gist/vaneseltine/964fc9dec60e59410b91bbcaf1fe2d11/nom_pandas.ipynb
Go from list...
# raw_names
["Graham Arthur Chapman",
"cleese, john m",
"Gilliam, Terrence (Terry) Vance",
"Eric Idle",
'Mr. Terence "Terry" Graham Parry Jones',
"M E Palin",
"Neil James Innes",
"carol cleveland",
"Adams, Douglas N"]
...to DataFrame in a couple simple notebook cells.
0 title first middle last suffix nickname
0 Graham Arthur Chapman graham arthur chapman
1 cleese, john m john m cleese
2 Gilliam, Terrence (Terry) Vance terrence vance gilliam terry
3 Eric Idle eric idle
4 Mr. Terence "Terry" Graham Parry Jones mr terence graham parry jones terry
5 M E Palin m e palin
7 carol cleveland carol cleveland
6 Neil James Innes neil james innes
8 Adams, Douglas N douglas n adams
🎓 Origins
nominally grew from—and greatly benefits from the test bank of—the python-nameparser package. The key difference is that nominally focuses relatively narrowly on lists of decently well-formed single name fields. Therefore, nominally does not support:
- Mutability of Name
- Easy customization of lists of name parts
- Parsing multiple names from mingled fields
- Most titles, profession names, and other name prefixes
- Mononyms: raw names expected to output only a single field
- Encoding other than UTF-8
- Input from byte strings
- Python 3.5 or lower
🧙 Author
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nominally-0.9.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e1db84ddfb32a5fdf893068fabf2f0c93dd72fa341b827deffc1282fbdcbd64 |
|
MD5 | 8f00c877a0dae75777cda951774d76eb |
|
BLAKE2b-256 | 807b009f2689676bbb29c28cb897e9525f09a74bb8c17ef764d3ba82a9f9388e |