name-ethnicity classifier
Project description
ethnicseer - a name-ethnicity classifier
ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: Middle-Eastern, Chinese, English, French, Vietnam, Spanish, Italian, German, Japanese, Russian, Indian, and Korean. The included pre-trained model can achieve around 84% accuracy on the test data set.
ethnicseer is based on the name-ethnicity classifier, orginally proposed here:
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183
Requirements
- abydos
- scikit-learn
- nltk
- python >= 3.7.6
Installation
ethnicseer
can be installed using pip
pip install ethnicseer
Usages
Once installed, you can use ethnicseer
within your python code to classify whether a Thai name is a person name or a corporate name.
>>> from ethnicseer import EthnicClassifier
>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']
Citation
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Author
Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)
License
This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ethnicseer-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7df8c14a7857889af51cc17184abe8f39e093edad1943b5abd622cca6f95998c |
|
MD5 | 43387e697ad2321e0fc04a4578cc3fec |
|
BLAKE2b-256 | 293ab3a0e1f12e927a4291d90d759039fd224aefa0a10d35c8762a28dc2ce604 |