Skip to main content

name-ethnicity classifier

Project description

ethnicseer ['ethnic-seer'] - a name-ethnicity classifier

ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: middle-eastern, chinese, english, french, vietnam, spanish, italian, german, japanese, russian, indian, and korean. The included pre-trained model can achieve around 84% accuracy on the test data set.

ethnicseer is based on the name-ethnicity classifier, orginally proposed here:

Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.

Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183

Requirements

  • abydos
  • scikit-learn
  • nltk
  • python = 3.9+

Installation

ethnicseer can be installed using pip

pip install ethnicseer

Usages

Once installed, you can use ethnicseer within your python code to classify whether a Thai name is a person name or a corporate name.

>>> from ethnicseer import EthnicClassifier

>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']
>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']

Citation

Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.

Author

Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)

License

This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ethnicseer-0.1.0.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

ethnicseer-0.1.0-py3-none-any.whl (3.0 MB view details)

Uploaded Python 3

File details

Details for the file ethnicseer-0.1.0.tar.gz.

File metadata

  • Download URL: ethnicseer-0.1.0.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for ethnicseer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f84f7f8d29945510b27771426758d3f9d4b06cf19d0d594d22bd9cca595ff138
MD5 1f7c9d4415b504113cc6aecddd365250
BLAKE2b-256 8463e23d49683aefa9fe6416decce46d891c219fa260c017a0116b0d91dfd8c1

See more details on using hashes here.

File details

Details for the file ethnicseer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ethnicseer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7

File hashes

Hashes for ethnicseer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 581a4649099ca8c9159df8593bbd1527d705b4de5a62d7f9007231a80d155b0a
MD5 d90daef79bea5aca4eae4e1770544756
BLAKE2b-256 5c68718e2b38c5c2d8713009c80df6dcd98295135ba3f72ed92dadd83e4bbf6c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page