Skip to main content

name-ethnicity classifier

Project description

ethnicseer - a name-ethnicity classifier

ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: Middle-Eastern, Chinese, English, French, Vietnam, Spanish, Italian, German, Japanese, Russian, Indian, and Korean. The included pre-trained model can achieve around 84% accuracy on the test data set.

ethnicseer is based on the name-ethnicity classifier, orginally proposed here:

Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.

Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183

Requirements

  • abydos
  • scikit-learn
  • nltk
  • python >= 3.7.6

Installation

ethnicseer can be installed using pip

pip install ethnicseer

Usages

Once installed, you can use ethnicseer within your python code to classify whether a Thai name is a person name or a corporate name.

>>> from ethnicseer import EthnicClassifier

>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']

Citation

Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.

Author

Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)

License

This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ethnicseer-0.1.2.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

ethnicseer-0.1.2-py3-none-any.whl (3.0 MB view details)

Uploaded Python 3

File details

Details for the file ethnicseer-0.1.2.tar.gz.

File metadata

  • Download URL: ethnicseer-0.1.2.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for ethnicseer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d14d7d39f5bde58a5ae4097f8bd2788f5286a3020381cba7571e34102cbef361
MD5 cc41390b8bb98fb495fa84d87661eca1
BLAKE2b-256 6ec887df48f0ba317488ec959fcdcae9936ab61ce44fcc499d8c40c447f83215

See more details on using hashes here.

File details

Details for the file ethnicseer-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ethnicseer-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for ethnicseer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7df8c14a7857889af51cc17184abe8f39e093edad1943b5abd622cca6f95998c
MD5 43387e697ad2321e0fc04a4578cc3fec
BLAKE2b-256 293ab3a0e1f12e927a4291d90d759039fd224aefa0a10d35c8762a28dc2ce604

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page