Python package for gender classification.
Project description
chicksexer - Python package for gender classification
=================================================================
![Chicksexer](images/chicksexer.jpg?raw=true "Title")
`chicksexer` is a Python package that performs **gender classification**. It receives a string of person name and returns the probability estimate of its gender as follows:
```python
>>> from chicksexer import predict_gender
>>> predict_gender('John Smith')
{'female': 0.0027230381965637207, 'male': 0.9972769618034363}
```
Several merits of using the classifier instead of simply looking up known male/female names are:
* Sometimes simple name lookup does not work. For instance, "Miki" is a Japanese female name, but also a Croatian male name.
* Can predict the gender of a name that does not exist in the list of male/female names.
* Can deal with a typo in a name relatively easily.
You can also get an estimate as a simple string as follows:
```python
>>> predict_gender('Oliver Butterfield', return_proba=False)
'male'
>>> predict_gender('Naila Ata', return_proba=False)
'female'
>>> predict_gender('Saldivar Anderson', return_proba=False)
'neutral'
>>> predict_gender('Ponyo', return_proba=False) # name of a character from the film
'male'
>>> predict_gender('Ponya', return_proba=False) # modify the name such that it sounds like a female name
'female'
>>> predict_gender('Miki Suzuki', return_proba=True) # Suzuki here is a Japanese surname so Miki is a female name
{'female': 0.9997618066990981, 'male': 0.00023819330090191215}
>>> predict_gender('Miki Adamić', return_proba=True) # Adamić is a Croatian surname so Miki is a male name
{'female': 0.16958969831466675, 'male': 0.8304103016853333}
>>> predict_gender('Jessica')
{'female': 0.999996105068476, 'male': 3.894931523973355e-06}
>>> predict_gender('Jesssica') # typo in Jessica
{'female': 0.9999851534785194, 'male': 1.4846521480649244e-05}
```
If you want to predict the gender of multiple names, use `predict_genders` (plural) function instead:
```python
>>> from chicksexer import predict_genders
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'])
[{'female': 3.039836883544922e-05, 'male': 0.9999696016311646},
{'female': 1.2040138244628906e-05, 'male': 0.9999879598617554}]
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'], return_proba=False)
['male', 'male']
```
Installation
------------
- This repository can run on Ubuntu 14.04 LTS & Mac OSX 10.x (not tested on other OSs)
- Tested only on Python 3.5
`chicksexer` depends on [NumPy and Scipy](https://www.scipy.org/install.html), Python packages for scientific computing. You might need to have them installed prior to installing `chicksexer`.
You can install `chicksexer` by:
```bash
pip install chicksexer
```
`chicksexer` also depends on `tensorflow` package. In default, it tries to install the CPU-only version of `tensorflow`. If you want to use GPU, you need to install `tensorflow` with GPU support by yourself. (C.f. [Installing Tensorflow](https://www.tensorflow.org/install/))
Model Architecture
------------------
The gender classifier is implemented using Character-level Multilayer LSTM. The architecture is roughly as follows:
1. Character Embedding Layer
2. 1st LSTM Layer
3. 2nd LSTM Layer
4. Pooling Layer
5. Fully Connected Layer
The fully connected layer outputs the probability of a name bing a male name. For the details, look at `_build_graph()` method in `chicksexer/_classifier.py`, which implements the computational graph of the architecture in `tensorflow`.
Training Data
-------------
Names with gender annotation are obtained from the sources as follows:
* [Dbpedia Person Data](http://downloads.dbpedia.org/2015-10/core-i18n/en/persondata_en.tql.bz2)
* [Popular baby names in the US](https://www.ssa.gov/oact/babynames/limits.html)
* [Names dataset curated by Milos Bejda](https://mbejda.github.io/)
=================================================================
![Chicksexer](images/chicksexer.jpg?raw=true "Title")
`chicksexer` is a Python package that performs **gender classification**. It receives a string of person name and returns the probability estimate of its gender as follows:
```python
>>> from chicksexer import predict_gender
>>> predict_gender('John Smith')
{'female': 0.0027230381965637207, 'male': 0.9972769618034363}
```
Several merits of using the classifier instead of simply looking up known male/female names are:
* Sometimes simple name lookup does not work. For instance, "Miki" is a Japanese female name, but also a Croatian male name.
* Can predict the gender of a name that does not exist in the list of male/female names.
* Can deal with a typo in a name relatively easily.
You can also get an estimate as a simple string as follows:
```python
>>> predict_gender('Oliver Butterfield', return_proba=False)
'male'
>>> predict_gender('Naila Ata', return_proba=False)
'female'
>>> predict_gender('Saldivar Anderson', return_proba=False)
'neutral'
>>> predict_gender('Ponyo', return_proba=False) # name of a character from the film
'male'
>>> predict_gender('Ponya', return_proba=False) # modify the name such that it sounds like a female name
'female'
>>> predict_gender('Miki Suzuki', return_proba=True) # Suzuki here is a Japanese surname so Miki is a female name
{'female': 0.9997618066990981, 'male': 0.00023819330090191215}
>>> predict_gender('Miki Adamić', return_proba=True) # Adamić is a Croatian surname so Miki is a male name
{'female': 0.16958969831466675, 'male': 0.8304103016853333}
>>> predict_gender('Jessica')
{'female': 0.999996105068476, 'male': 3.894931523973355e-06}
>>> predict_gender('Jesssica') # typo in Jessica
{'female': 0.9999851534785194, 'male': 1.4846521480649244e-05}
```
If you want to predict the gender of multiple names, use `predict_genders` (plural) function instead:
```python
>>> from chicksexer import predict_genders
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'])
[{'female': 3.039836883544922e-05, 'male': 0.9999696016311646},
{'female': 1.2040138244628906e-05, 'male': 0.9999879598617554}]
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'], return_proba=False)
['male', 'male']
```
Installation
------------
- This repository can run on Ubuntu 14.04 LTS & Mac OSX 10.x (not tested on other OSs)
- Tested only on Python 3.5
`chicksexer` depends on [NumPy and Scipy](https://www.scipy.org/install.html), Python packages for scientific computing. You might need to have them installed prior to installing `chicksexer`.
You can install `chicksexer` by:
```bash
pip install chicksexer
```
`chicksexer` also depends on `tensorflow` package. In default, it tries to install the CPU-only version of `tensorflow`. If you want to use GPU, you need to install `tensorflow` with GPU support by yourself. (C.f. [Installing Tensorflow](https://www.tensorflow.org/install/))
Model Architecture
------------------
The gender classifier is implemented using Character-level Multilayer LSTM. The architecture is roughly as follows:
1. Character Embedding Layer
2. 1st LSTM Layer
3. 2nd LSTM Layer
4. Pooling Layer
5. Fully Connected Layer
The fully connected layer outputs the probability of a name bing a male name. For the details, look at `_build_graph()` method in `chicksexer/_classifier.py`, which implements the computational graph of the architecture in `tensorflow`.
Training Data
-------------
Names with gender annotation are obtained from the sources as follows:
* [Dbpedia Person Data](http://downloads.dbpedia.org/2015-10/core-i18n/en/persondata_en.tql.bz2)
* [Popular baby names in the US](https://www.ssa.gov/oact/babynames/limits.html)
* [Names dataset curated by Milos Bejda](https://mbejda.github.io/)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chicksexer-0.2.2.tar.gz
(14.1 MB
view details)
Built Distribution
File details
Details for the file chicksexer-0.2.2.tar.gz
.
File metadata
- Download URL: chicksexer-0.2.2.tar.gz
- Upload date:
- Size: 14.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f6c000f9d94d1a44f83d82cbaefcee1483401a3579ddaef8503e9e9cbc9d91c |
|
MD5 | dac7a308c115b5bef16d7c5ea3d205ad |
|
BLAKE2b-256 | 12eb3d5085b74bf68fd069fe654f10b3e6b7a2270f6db5d6f493c09edc183446 |
File details
Details for the file chicksexer-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: chicksexer-0.2.2-py3-none-any.whl
- Upload date:
- Size: 14.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc17fd86771cfd8315e580a848518006cc396feaab54403b3a7d93dec363e19a |
|
MD5 | d93527f2fe90b0c076212f13e96af91c |
|
BLAKE2b-256 | 94ebbe8da085100df911d22f94de34dab84c91ce18899d103e03cd521fbf8524 |