Skip to main content

Gender gusser for Chinese names in English (pinyin) form

Project description

# chgender
Introduction:

Gender guess for Chinese names in English(Pinyin) form

- language: Python
- method: Naive Bayes with different weight
- dataset: 20 million Chinese name
- accuracy: 81% for 1500 random selected samples

Possible usage field:

- User registration on websites or New contact creation. Based on the names, we can pre select the gender option.
- Gender analysis for Chinese people who publish papers on English journals.

How to use:

install:
1. use pip
$ pip install chgender
2. clone from the git
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install

usage:
1.use as module
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)

2.use on bash
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896

3.use for batch
3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......


简介:

依据拼音形式的中文名字来猜测性别

- python
- 使用朴素贝叶斯法,并对不同的位置的字分配权重
- 基于2000万姓名数据量
- 对1500个随机样本进行测试,准确率81%


可使用领域:

-账户注册、通讯录添加等。可依据用户输入的拼音,提前判断并选择好男女选项,提高用户体验。
-英文文献数据分析。对于外文期刊中的拼音形式的中文名字,分析相关方面的男女差异。


用法:

安装方式:
1.pip直接安装
$ pip install chgender

2.git下载本地
$ git clone git@github.com:jiajianzhou/chgender.git
$ sudo python setup.py install

使用形式:
1.作为模块使用
>>>import chgender
>>>chgender.guess('dehua liu')
('male', 0.966248721556)

2.在bash上使用
$chg -n dehua xueyou fucheng ming
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896

3.批量处理
$chg -f samples.txt
name: dehua => gender: male, probability: 0.966248721556
name: xueyou => gender: male, probability: 0.985020743536
name: fucheng => gender: male, probability: 0.999357367222
name: ming => gender: male, probability: 0.851123622896
......

Project details


Release history Release notifications

This version
History Node

0.0.2

History Node

0.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
chgender-0.0.2-py2.7.egg (10.7 kB) Copy SHA256 hash SHA256 Egg 2.7 Nov 7, 2016
chgender-0.0.2.tar.gz (9.1 kB) Copy SHA256 hash SHA256 Source None Nov 7, 2016

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page