Skip to main content

Gender gusser for Chinese names in English (pinyin) form

Project description

# chgender
Introduction:

Gender guess for Chinese names in English(Pinyin) form

- language: Python
- method: Naive Bayes with different weight
- dataset: 20 million Chinese name
- accuracy: 81% for 1500 random selected samples

Possible usage field:

- User registration on websites or New contact creation. Based on the names, we can pre select the gender option.
- Gender analysis for Chinese people who publish papers on English journals.

How to use:

install:
1. use pip
$ pip install chgender
2. clone from the git
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install

usage:
1.use as module
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)

2.use on bash
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896

3.use for batch
3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......


简介:

依据拼音形式的中文名字来猜测性别

- python
- 使用朴素贝叶斯法,并对不同的位置的字分配权重
- 基于2000万姓名数据量
- 对1500个随机样本进行测试,准确率81%


可使用领域:

-账户注册、通讯录添加等。可依据用户输入的拼音,提前判断并选择好男女选项,提高用户体验。
-英文文献数据分析。对于外文期刊中的拼音形式的中文名字,分析相关方面的男女差异。


用法:

安装方式:
1.pip直接安装
$ pip install chgender

2.git下载本地
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install

使用形式:
1.作为模块使用
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)

2.在bash上使用
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896

3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chgender-0.0.2.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

chgender-0.0.2-py2.7.egg (10.7 kB view details)

Uploaded Source

File details

Details for the file chgender-0.0.2.tar.gz.

File metadata

  • Download URL: chgender-0.0.2.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for chgender-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1817586fdc961344d4e9a5e18da4b2704498e1b2cfb9a8d724c960541e5b6173
MD5 459b8425542b026b92af3f15c829c126
BLAKE2b-256 619c5dfcf76beb272bf7974d3841d23c9c808dd7fae02543e09eebf027ec5131

See more details on using hashes here.

File details

Details for the file chgender-0.0.2-py2.7.egg.

File metadata

  • Download URL: chgender-0.0.2-py2.7.egg
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for chgender-0.0.2-py2.7.egg
Algorithm Hash digest
SHA256 7df093ba78096cc6fe3a0e1b327df252e7ca66d0d4c4fce6caebc1632227e102
MD5 2aa52c6562835745da4a5c1454e58262
BLAKE2b-256 0d5c0f601b7756488549a147792d74929dfbb4d8fb2b4c4abfcb24d7dfa24461

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page