Skip to main content

Python package for gender classification.

Project description

chicksexer - Python package for gender classification
=================================================================

![Chicksexer](images/chicksexer.jpg?raw=true "Title")

`chicksexer` is a Python package that performs **gender classification**. It receives a string of person name and returns the probability estimate of its gender as follows:

```python
>>> from chicksexer import predict_gender
>>> predict_gender('John Smith')
{'female': 0.0027230381965637207, 'male': 0.9972769618034363}
```

Several merits of using the classifier instead of simply looking up known male/female names are:

* Sometimes simple name lookup does not work. For instance, "Miki" is a Japanese female name, but also a Croatian male name.
* Can predict the gender of a name that does not exist in the list of male/female names.
* Can deal with a typo in a name relatively easily.

You can also get an estimate as a simple string as follows:

```python
>>> predict_gender('Oliver Butterfield', return_proba=False)
'male'
>>> predict_gender('Naila Ata', return_proba=False)
'female'
>>> predict_gender('Saldivar Anderson', return_proba=False)
'neutral'
>>> predict_gender('Ponyo', return_proba=False) # name of a character from the film
'male'
>>> predict_gender('Ponya', return_proba=False) # modify the name such that it sounds like a female name
'female'
>>> predict_gender('Miki Suzuki', return_proba=True) # Suzuki here is a Japanese surname so Miki is a female name
{'female': 0.9997618066990981, 'male': 0.00023819330090191215}
>>> predict_gender('Miki Adamić', return_proba=True) # Adamić is a Croatian surname so Miki is a male name
{'female': 0.16958969831466675, 'male': 0.8304103016853333}
>>> predict_gender('Jessica')
{'female': 0.999996105068476, 'male': 3.894931523973355e-06}
>>> predict_gender('Jesssica') # typo in Jessica
{'female': 0.9999851534785194, 'male': 1.4846521480649244e-05}
```

If you want to predict the gender of multiple names, use `predict_genders` (plural) function instead:

```python
>>> from chicksexer import predict_genders
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'])
[{'female': 3.039836883544922e-05, 'male': 0.9999696016311646},
{'female': 1.2040138244628906e-05, 'male': 0.9999879598617554}]
>>> predict_genders(['Ichiro Suzuki', 'Haruki Murakami'], return_proba=False)
['male', 'male']
```

Installation
------------
- This repository can run on Ubuntu 14.04 LTS & Mac OSX 10.x (not tested on other OSs)
- Tested only on Python 3.5

`chicksexer` depends on [NumPy and Scipy](https://www.scipy.org/install.html), Python packages for scientific computing. You might need to have them installed prior to installing `chicksexer`.

You can install `chicksexer` by:

```bash
pip install chicksexer
```

`chicksexer` also depends on `tensorflow` package. In default, it tries to install the CPU-only version of `tensorflow`. If you want to use GPU, you need to install `tensorflow` with GPU support by yourself. (C.f. [Installing Tensorflow](https://www.tensorflow.org/install/))

Model Architecture
------------------
The gender classifier is implemented using Character-level Multilayer LSTM. The architecture is roughly as follows:

1. Character Embedding Layer
2. 1st LSTM Layer
3. 2nd LSTM Layer
4. Pooling Layer
5. Fully Connected Layer

The fully connected layer outputs the probability of a name bing a male name. For the details, look at `_build_graph()` method in `chicksexer/_classifier.py`, which implements the computational graph of the architecture in `tensorflow`.

Training Data
-------------
Names with gender annotation are obtained from the sources as follows:

* [Dbpedia Person Data](http://downloads.dbpedia.org/2015-10/core-i18n/en/persondata_en.tql.bz2)
* [Popular baby names in the US](https://www.ssa.gov/oact/babynames/limits.html)
* [Names dataset curated by Milos Bejda](https://mbejda.github.io/)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chicksexer-0.2.2.tar.gz (14.1 MB view details)

Uploaded Source

Built Distribution

chicksexer-0.2.2-py3-none-any.whl (14.1 MB view details)

Uploaded Python 3

File details

Details for the file chicksexer-0.2.2.tar.gz.

File metadata

  • Download URL: chicksexer-0.2.2.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for chicksexer-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7f6c000f9d94d1a44f83d82cbaefcee1483401a3579ddaef8503e9e9cbc9d91c
MD5 dac7a308c115b5bef16d7c5ea3d205ad
BLAKE2b-256 12eb3d5085b74bf68fd069fe654f10b3e6b7a2270f6db5d6f493c09edc183446

See more details on using hashes here.

File details

Details for the file chicksexer-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for chicksexer-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fc17fd86771cfd8315e580a848518006cc396feaab54403b3a7d93dec363e19a
MD5 d93527f2fe90b0c076212f13e96af91c
BLAKE2b-256 94ebbe8da085100df911d22f94de34dab84c91ce18899d103e03cd521fbf8524

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page