Skip to main content

name-ethnicity classifier

Project description

ethnicseer ['ethnic-seer'] - a name-ethnicity classifier

ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: Middle-Eastern, Chinese, English, French, Vietnam, Spanish, Italian, German, Japanese, Russian, Indian, and Korean. The included pre-trained model can achieve around 84% accuracy on the test data set.

ethnicseer is based on the name-ethnicity classifier, orginally proposed here:

Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.

Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183

Requirements

  • abydos
  • scikit-learn
  • nltk
  • python >= 3.7.6

Installation

ethnicseer can be installed using pip

pip install ethnicseer

Usages

Once installed, you can use ethnicseer within your python code to classify whether a Thai name is a person name or a corporate name.

>>> from ethnicseer import EthnicClassifier

>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']

Citation

Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.

Author

Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)

License

This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ethnicseer-0.1.1.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

ethnicseer-0.1.1-py3-none-any.whl (3.0 MB view details)

Uploaded Python 3

File details

Details for the file ethnicseer-0.1.1.tar.gz.

File metadata

  • Download URL: ethnicseer-0.1.1.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for ethnicseer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 633a884ced1ae68176a22540ca54101129bf21d581f263f8b8ec949f78d3c81b
MD5 fd85270f995553e87f6e98c9d39078a8
BLAKE2b-256 1c287e5103e64cb1b1f0e032f09ef4ea648da89ce717ceac3c41d9a17e3cec29

See more details on using hashes here.

File details

Details for the file ethnicseer-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ethnicseer-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for ethnicseer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8e17b915b634a051a1bc6a583d28ec37429ff4f723c4827a2bb953f76e537d65
MD5 a5324e939e05108c0faa9f2ad7a3fda3
BLAKE2b-256 15b3d7388679f1486ebcfd616a6a75269d58f4a834333d420ef353960db3358e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page