namesex-light

A lightweight gender classifier for Chinese given names

These details have not been verified by PyPI

Project links

Homepage

Project description

namesex_light

Namesex_light is a lighweight package that predicts the gender tendency of Chinese given names. This module comes with a L2 regularized logistic regression trained on 10,730 Chinese given names (in traditional Chinese) with reliable gender lables collected from public data. The predict() function takes a list of names and output predicted gender tendency (1 for male and 0 for female) or probability of being a male name. Namesex_light has a sister project, namesex, that performs similar tasks with higher accuracy.

Additional information about namesex and namesex_light can be found in another document (in Chinese).

The prediction performance evaluated by ten-fold cross validation is:

Metric	Performance	Performance Std. Dev.
Accuracy	0.8957	0.007327
F1	0.8920	0.007873
Precision	0.8852	0.012238
Recall	0.8991	0.008936
Logloss	114.35	6.413972

Use pip/pip3 to install namesex_light.:

pip install namesex_light

To use namesex_light, pass in an array or list of given names to predict(). For each element in the input list, predict() returns 1 or 0 for male or female prediction. Set “predprob = True” to return probability of being a male name. The following is a simple sample code.:

>>> import namesex_light
>>> nsl = namesex_light.namesex_light()
>>> nsl.predict(['民豪', '愛麗', '志明'])
array([1, 0, 1])
>>> nsl.predict(['民豪', '愛麗', '志明'], predprob=True)
array([0.99968932, 0.00530066, 0.9938986 ])

Note that namesex_light was trained using Chinese given names only. However, it may be used to classifier translated names as well:

>>> nsl.predict(['阿波羅', '阿波羅', '雷', '艾美', '布蘭妮', '阿曼達'])
array([1, 1, 1, 0, 0, 1])

This module is intended for a quick plug-and-play. The original training dataset is not included.

Testing Dataset

This package comes with a small testing dataset that was not used for model training. The following sample code illustrate a simple usage.:

>>> testdata = namesex_light.testdata()
>>> nsl = namesex_light.namesex_light()
>>> pred = nsl.predict(testdata.gname)
>>> print("The first 5 given names are: {}".format(testdata.gname[0:5]))
The first 5 given names are: ['翊如', '妤庭', '諆璋', '大閎', '和維']
>>> print("    and their sex:          {}".format(testdata.sex[0:5]))
    and their sex:          [0, 0, 1, 1, 1]
>>> print("    and their predicted sex:{}".format(pred[0:5]))
    and their predicted sex:[0 0 1 1 1]
>>> accuracy = np.sum(pred == testdata.sex) / len(pred)
>>> print(" Prediction accuracy = {}".format(accuracy))
 Prediction accuracy = 0.8627450980392157

Note that the accuracy is slightly lower compared to the accuracy of ten-fold cross valudation. I guess this is normal since this testset is collected from a source that is different from the training dataset.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

Jul 11, 2018

0.2.0

Jul 11, 2018

0.1.6

Jul 9, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

namesex_light-0.2.1.tar.gz (4.7 kB view details)

Uploaded Jul 11, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

namesex_light-0.2.1-py3-none-any.whl (69.4 kB view details)

Uploaded Jul 11, 2018 Python 3

File details

Details for the file namesex_light-0.2.1.tar.gz.

File metadata

Download URL: namesex_light-0.2.1.tar.gz
Upload date: Jul 11, 2018
Size: 4.7 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for namesex_light-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`338b8142e0500270a324e63a7f42b615913def257246f8328da3fcc8eea2f5ea`
MD5	`abf97d1b0cc85ac2f1906a06dbe4de49`
BLAKE2b-256	`6fd2472d16e588449fa1cae698fb7f420347b5865420695a8a1980959bcc27c6`

See more details on using hashes here.

File details

Details for the file namesex_light-0.2.1-py3-none-any.whl.

File metadata

Download URL: namesex_light-0.2.1-py3-none-any.whl
Upload date: Jul 11, 2018
Size: 69.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for namesex_light-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e426353bb0ed48cbf990a72ebaf16b69403fa7e4dbb6528d5912a77cd9c2f49c`
MD5	`53a701708d79da924731d968fbf96e37`
BLAKE2b-256	`22da3a02cc2b5c629fff8c18f23922168c5e7c0385a58e31f28b8f9abb78f15a`

See more details on using hashes here.

namesex-light 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

namesex_light

Testing Dataset

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes