Gender gusser for Chinese names in English (pinyin) form
Project description
# chgender
Introduction:
Gender guess for Chinese names in English(Pinyin) form
- language: Python
- method: Naive Bayes with different weight
- dataset: 20 million Chinese name
- accuracy: 81% for 1500 random selected samples
Possible usage field:
- User registration on websites or New contact creation. Based on the names, we can pre select the gender option.
- Gender analysis for Chinese people who publish papers on English journals.
How to use:
install:
1. use pip
$ pip install chgender
2. clone from the git
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install
usage:
1.use as module
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)
2.use on bash
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
3.use for batch
3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......
简介:
依据拼音形式的中文名字来猜测性别
- python
- 使用朴素贝叶斯法,并对不同的位置的字分配权重
- 基于2000万姓名数据量
- 对1500个随机样本进行测试,准确率81%
可使用领域:
-账户注册、通讯录添加等。可依据用户输入的拼音,提前判断并选择好男女选项,提高用户体验。
-英文文献数据分析。对于外文期刊中的拼音形式的中文名字,分析相关方面的男女差异。
用法:
安装方式:
1.pip直接安装
$ pip install chgender
2.git下载本地
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install
使用形式:
1.作为模块使用
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)
2.在bash上使用
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......
Introduction:
Gender guess for Chinese names in English(Pinyin) form
- language: Python
- method: Naive Bayes with different weight
- dataset: 20 million Chinese name
- accuracy: 81% for 1500 random selected samples
Possible usage field:
- User registration on websites or New contact creation. Based on the names, we can pre select the gender option.
- Gender analysis for Chinese people who publish papers on English journals.
How to use:
install:
1. use pip
$ pip install chgender
2. clone from the git
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install
usage:
1.use as module
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)
2.use on bash
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
3.use for batch
3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......
简介:
依据拼音形式的中文名字来猜测性别
- python
- 使用朴素贝叶斯法,并对不同的位置的字分配权重
- 基于2000万姓名数据量
- 对1500个随机样本进行测试,准确率81%
可使用领域:
-账户注册、通讯录添加等。可依据用户输入的拼音,提前判断并选择好男女选项,提高用户体验。
-英文文献数据分析。对于外文期刊中的拼音形式的中文名字,分析相关方面的男女差异。
用法:
安装方式:
1.pip直接安装
$ pip install chgender
2.git下载本地
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install
使用形式:
1.作为模块使用
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)
2.在bash上使用
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chgender-0.0.2.tar.gz
(9.1 kB
view details)
Built Distribution
chgender-0.0.2-py2.7.egg
(10.7 kB
view details)
File details
Details for the file chgender-0.0.2.tar.gz
.
File metadata
- Download URL: chgender-0.0.2.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1817586fdc961344d4e9a5e18da4b2704498e1b2cfb9a8d724c960541e5b6173 |
|
MD5 | 459b8425542b026b92af3f15c829c126 |
|
BLAKE2b-256 | 619c5dfcf76beb272bf7974d3841d23c9c808dd7fae02543e09eebf027ec5131 |
File details
Details for the file chgender-0.0.2-py2.7.egg
.
File metadata
- Download URL: chgender-0.0.2-py2.7.egg
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7df093ba78096cc6fe3a0e1b327df252e7ca66d0d4c4fce6caebc1632227e102 |
|
MD5 | 2aa52c6562835745da4a5c1454e58262 |
|
BLAKE2b-256 | 0d5c0f601b7756488549a147792d74929dfbb4d8fb2b4c4abfcb24d7dfa24461 |