name-ethnicity classifier
Project description
ethnicseer ['ethnic-seer'] - a name-ethnicity classifier
ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: middle-eastern, chinese, english, french, vietnam, spanish, italian, german, japanese, russian, indian, and korean. The included pre-trained model can achieve around 84% accuracy on the test data set.
ethnicseer is based on the name-ethnicity classifier, orginally proposed here:
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183
Requirements
- abydos
- scikit-learn
- nltk
- python = 3.9+
Installation
ethnicseer
can be installed using pip
pip install ethnicseer
Usages
Once installed, you can use ethnicseer
within your python code to classify whether a Thai name is a person name or a corporate name.
>>> from ethnicseer import EthnicClassifier
>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']
>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']
Citation
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Author
Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)
License
This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ethnicseer-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 581a4649099ca8c9159df8593bbd1527d705b4de5a62d7f9007231a80d155b0a |
|
MD5 | d90daef79bea5aca4eae4e1770544756 |
|
BLAKE2b-256 | 5c68718e2b38c5c2d8713009c80df6dcd98295135ba3f72ed92dadd83e4bbf6c |