Skip to main content

Identify family/given names and capitalize correctly

Project description

README

nameutils - Identify given/family names and capitalize correctly

Description

nameutils is a python module containing functions that can split a person's full name into their given and family names, and capitalize the letters appropriately. It understands complex names in Latin scripts from many different languages, and it understands Chinese, Japanese, and Korean names, in both their own characters, and romanized.

This module is useful when receiving a person's name that might be all uppercase, or in the wrong case, or it might have the given names and the family name combined in a single string (e.g., a single spreadsheet column), and you need to split the full name into its parts, and you want to set the capitalization correctly so as to show each person a little respect by taking the trouble to at least try to get their name right.

Getting the case right for people's names is difficult, and many software systems address this problem by not even trying, and using uppercase exclusively. It's ugly, but it's easy and consistent. We can do better. It can't be perfect, by default, but with ongoing adjustments to suit your evolving dataset, you can improve it to meet your needs.

People with complex grammatical aristocratic/topographic/patronymic family names often don't know how their own names should be capitalized. Or at least, they don't know how their own ancestors capitalized their name, or they know, but they disagree with it. Some people insist on having it their own way, and that's fine. This module, by default, prefers how their ancestors would have capitalized their names, but people can do whatever they want to their own names, and it's important to them, so this module supports general exceptions that apply to everyone with a particular family name, for when the default behaviour is definitely wrong, and it also supports exceptions that apply only to individuals who report that it is wrong for them.

Note: This module doesn't handle every name on Earth. Apart from Chinese, Japanese, and Korean family names, it only understands names written in Latin scripts, except perhaps by lucky accident. For example, names in Cyrillic work. It doesn't handle honorifics, titles, joined initials, or postnominals. It only handles names. But it does handle complex names coming from a variety of places (e.g., British Isles, Europe, Middle East, Africa, East Asia, Pacifika, Americas). By default, it doesn't correctly identify unhyphenated multi-name family names (like Spanish and Catalan names, unless the formal "y" or "i" is present). Such names need to be handled with split exceptions. It handles some mixed case names like McAdam, MacArthur, FitzSimmons, DeVito, VanZandt, etc., but there will be false negatives (and arguably false positives) which can be corrected with case exceptions. Over time, you will build up a set of case exceptions and split exceptions that meets the needs of your dataset.

This is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 3 or later.

Documentation

There is a manual entry:

Download

nameutils is on PyPI:

And can be installed using pip:

    python3 -m pip install nameutils

Requirements

nameutils is a python module that should work on systems with any version of python3. It doesn't depend on any non-standard modules.


URL: https://raf.org/nameutils
GIT: https://github.com/rafmod/nameutils
GIT: https://codeberg.org/rafmod/nameutils
Date: 20230709
Author: raf <raf@raf.org>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nameutils-1.0.0.tar.gz (77.8 kB view details)

Uploaded Source

Built Distribution

nameutils-1.0.0-py3-none-any.whl (70.0 kB view details)

Uploaded Python 3

File details

Details for the file nameutils-1.0.0.tar.gz.

File metadata

  • Download URL: nameutils-1.0.0.tar.gz
  • Upload date:
  • Size: 77.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for nameutils-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1e30bf7420cc74264ecc17a92d12fd2253f9f62ca1949cbbf7c0ffc3e6036bb6
MD5 d84fa9faa3ee58f6e7d1b5d37cc06d9e
BLAKE2b-256 161a65517a5a18eeb357a42d280991dc2e3ed5c97314992096935bff04a5d1a6

See more details on using hashes here.

File details

Details for the file nameutils-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: nameutils-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 70.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for nameutils-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 194a3509c9a455deec24232253ec970f0fcd44f180190201ad6ff2d6a9e5e9fd
MD5 570a5f2ac6e215c7c564114c14895011
BLAKE2b-256 0f756afedce7d53c5cf9f2e88f28ba94e689979768ca26c246ff30d7874a3da9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page