name-ethnicity classifier
Project description
ethnicseer - a name-ethnicity classifier
ethnicseer ('ethnic-seer') is a name-ethnicity classifier, written in python. It can determine the ethnicity of a given name, using linguistic features such as sequences of characters found in the name and its phonetic pronounciation. ethnicseer comes with a pre-trained model, which can handle the following 12 ethnicities: Middle-Eastern, Chinese, English, French, Vietnam, Spanish, Italian, German, Japanese, Russian, Indian, and Korean. The included pre-trained model can achieve around 84% accuracy on the test data set.
ethnicseer is based on the name-ethnicity classifier, orginally proposed here:
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Paper URL : https://ojs.aaai.org/index.php/AAAI/article/download/8324/8183
Requirements
- abydos
- scikit-learn
- nltk
- python >= 3.7.6
Installation
ethnicseer can be installed using pip
pip install ethnicseer
Usages
Once installed, you can use ethnicseer within your python code to classify whether a Thai name is a person name or a corporate name.
>>> from ethnicseer import EthnicClassifier
>>> ec = EthnicClassifier.load_pretrained_model()
>>> ec.classify_names(['Yūta Nakayama','Marcel Halstenberg','Raphaël Varane'])
['jap', 'ger', 'frn']
Citation
Treeratpituk, Pucktada, and C. Lee Giles. "Name-ethnicity classification and ethnicity-sensitive name matching." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 26. No. 1. 2012.
Author
Pucktada Treeratpituk, Bank of Thailand (pucktadt@bot.or.th)
License
This project is licensed under the Apache Software License 2.0 - see the LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ethnicseer-0.1.2.tar.gz.
File metadata
- Download URL: ethnicseer-0.1.2.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d14d7d39f5bde58a5ae4097f8bd2788f5286a3020381cba7571e34102cbef361
|
|
| MD5 |
cc41390b8bb98fb495fa84d87661eca1
|
|
| BLAKE2b-256 |
6ec887df48f0ba317488ec959fcdcae9936ab61ce44fcc499d8c40c447f83215
|
File details
Details for the file ethnicseer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: ethnicseer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 3.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.8.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7df8c14a7857889af51cc17184abe8f39e093edad1943b5abd622cca6f95998c
|
|
| MD5 |
43387e697ad2321e0fc04a4578cc3fec
|
|
| BLAKE2b-256 |
293ab3a0e1f12e927a4291d90d759039fd224aefa0a10d35c8762a28dc2ce604
|