name-ethnicity classifier
Project description
ethnicseer ['ethnic-seer'] - a name-ethnicity classifier
ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: Middle-Eastern, Chinese, English, French, Vietnam, Spanish, Italian, German, Japanese, Russian, Indian, and Korean. The included pre-trained model can achieve around 84% accuracy on the test data set.
ethnicseer is based on the name-ethnicity classifier, orginally proposed here:
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183
Requirements
- abydos
- scikit-learn
- nltk
- python >= 3.7.6
Installation
ethnicseer
can be installed using pip
pip install ethnicseer
Usages
Once installed, you can use ethnicseer
within your python code to classify whether a Thai name is a person name or a corporate name.
>>> from ethnicseer import EthnicClassifier
>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']
Citation
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Author
Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)
License
This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ethnicseer-0.1.1.tar.gz
.
File metadata
- Download URL: ethnicseer-0.1.1.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 633a884ced1ae68176a22540ca54101129bf21d581f263f8b8ec949f78d3c81b |
|
MD5 | fd85270f995553e87f6e98c9d39078a8 |
|
BLAKE2b-256 | 1c287e5103e64cb1b1f0e032f09ef4ea648da89ce717ceac3c41d9a17e3cec29 |
File details
Details for the file ethnicseer-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: ethnicseer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e17b915b634a051a1bc6a583d28ec37429ff4f723c4827a2bb953f76e537d65 |
|
MD5 | a5324e939e05108c0faa9f2ad7a3fda3 |
|
BLAKE2b-256 | 15b3d7388679f1486ebcfd616a6a75269d58f4a834333d420ef353960db3358e |