Gender Detection Tool by David Arroyo MEnéndez

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Logo
Name
Why?
Tell me about DAMe Gender on Youtube
Install
Check test
Execute program
Benchmarking
Statistics for damegender
1. Measuring success and fails
2. PCA
Speeches, Seminars, Expressions of Support
Beautiful Snakes
Dame Music
License

Logo

Name

damegender is a gender detection tool from the name coded by David Arroyo MEnéndez (DAME)

Why?

If you want determine gender gap in free software projects or mailing lists.
If you don't know the gender about a name
If you want research with statistics about why a name is related with males or females.
If you want use a free gender detection tool from a name from a command with open data.
If you want use the main solutions in gender detection (genderize, genderapi, namsor, nameapi and gender guesser) from a command.

DAMe Gender is for you!

Tell me about DAMe Gender on Youtube

Install

Docker Image

# Build the container image
$ docker build . -t damegender/damegender:latest

# Run the container
$ docker run -ti damegender/damegender:latest main.py David

Installing Software

Possible Debian/Ubuntu dependencies

$ sudo apt-get install python3-nose-exclude python3-dev dict dict-freedict-eng-spa dict-freedict-spa-eng dictd

From sources

$ git clone https://github.com/davidam/damegender
$ cd damegender
$ pip3 install -r requirements.txt

With python package

$ python3 -m venv /tmp/d
$ cd /tmp/d
$ source bin/activate
$ pip install --upgrade pip
$ pip3 install damegender
$ cd lib/python3.5/site-packages/damegender
$ python3 main.py David

To install apis extra dependencies:

$ pip3 install damegender[apis]

To install mailing lists and repositories extra dependencies:

$ pip3 install damegender[mails_and_repositories]

To install all possible dependencies

$ pip3 install damegender[all]

Obtaining an api key

Currently you can need an api key from:

You can execute:

$ python3 apikeyadd.py

To configure your api key

Configuring nltk

$ python3
>>> import nltk
>>> nltk.download('names')

Check test

All unit tests

$ nosetest3 test

Using Docker image

$ docker run -ti --entrypoint nosetests damegender/damegender:latest test

Single unit test

$ nosetests3 test/test_dame_sexmachine.py:TddInPythonExample.test_string2array_method_returns_correct_result

Using Docker image

$ docker run -ti --entrypoint nosetests damegender/damegender:latest test/test_dame_sexmachine.py:TddInPythonExample.test_string2array_method_returns_correct_result

Tests from commands

$ cd src/damegender
$ ./testsbycommands.sh         # It must run for you
$ ./testsbycommandsextralocal.sh    # You will need all dependencies with: $ pip3 install damegender[all]
$ ./testsbycommandsextranet.sh    # You will need api keys

Execute program

# Detect gender from a name (INE is the dataset used by default)
$ python3 main.py David
David gender is male
 363559  males for David from INE.es
0 females for David from INE.es

# Detect gender from a name only using machine learning (experimental way)
$ python3 main.py Mesa --ml=nltk
Mesa gender is female
0 males for Mesa from INE.es
0 females for Mesa from INE.es

# Detect gender from a name (all census and machine learning)
$ python3 main.py David --verbose
365196 males for David from INE.es
0 females for David from INE.es
1193 males for David from Uruguay census
5 females for David from Uruguay census
26645 males for David from United Kingdom census
0 females for David from United Kingdom census
3552580 males for David from United States of America census
12826 females for David from United States of America census
David gender predicted with nltk is male
David gender predicted with sgd is male
David gender predicted with svc is male
David gender predicted with gaussianNB is male
David gender predicted with multinomialNB is male
David gender predicted with bernoulliNB is male
David gender predicted with forest is male
David gender predicted with tree is male
David gender predicted with mlp is male

# Find your name in different countries
$ python3 nameincountries.py David
grep -i " David " files/names/nam_dict.txt > files/grep.tmp
males: ['Albania', 'Armenia', 'Austria', 'Azerbaijan', 'Belgium', 'Bosnia and Herzegovina', 'Czech Republic', 'Denmark', 'East Frisia', 'France', 'Georgia', 'Germany', 'Great Britain', 'Iceland', 'Ireland', 'Israel', 'Italy', 'Kazakhstan/Uzbekistan', 'Luxembourg', 'Malta', 'Norway', 'Portugal', 'Romania', 'Slovenia', 'Spain', 'Sweden', 'Swiss', 'The Netherlands', 'USA', 'Ukraine']
females: []
both: []

# Count gender from a git repository
$ python3 git2gender.py https://github.com/chaoss/grimoirelab-perceval.git --directory="/tmp/clonedir"
The number of males sending commits is 15
The number of females sending commits is 7

# Count gender from a mailing list
$ cd files/mbox
$ wget -c http://mail-archives.apache.org/mod_mbox/httpd-announce/201706.mbox
$ cd ..
$ python3 mail2gender.py http://mail-archives.apache.org/mod_mbox/httpd-announce/

# Use an api to detect the gender
$ python3 api2gender.py Leticia --surname="Martin" --api=namsor
female
scale: 0.99

# Google popularity for a name
$ python3 gendergoogle.py Leticia
Google results of Leticia as male: 42300
Google results of Leticia as female: 63400

# Give me informative features
$ python3 infofeatures.py
Females with last letter a: 0.4705246078961601
Males with last letter a: 0.048672566371681415
Females with last letter consonant: 0.2735841767750908
Males with last letter consonant: 0.6355328972681801
Females with last letter vocal: 0.7262612995441552
Males with last letter vocal: 0.3640823393612928

# Download results from an api and save in a file
$ python3 downloadjson --csv=files/names/min.csv --api=genderize
$ cat files/names/genderizefiles_names_min.csv.json

# To measure success
$ python3 accuracy.py --csv=files/names/min.csv
################### NLTK!!
Gender list: [1, 1, 1, 1, 2, 1, 0, 0]
Guess list:  [1, 1, 1, 1, 0, 1, 0, 0]
Dame Gender accuracy: 0.875

$ python3 accuracy.py --api="genderize" --csv=files/names/min.csv
################### Genderize!!
Gender list: [1, 1, 1, 1, 2, 1, 0, 0]
Guess list:  [1, 1, 1, 1, 2, 1, 0, 0]
Genderize accuracy: 1

$ python3 confusion.py --csv="files/names/partial.csv" --api=nameapi --jsondownloaded="files/names/nameapifiles_names_partial.csv.json"
A confusion matrix C is such that Ci,j is equal to the number of observations known to be in group i but predicted to be in group j.
If the classifier is nice, the diagonal is high because there are true positives
Nameapi confusion matrix:

[[ 3, 0, 0]
 [ 0, 15, 1]]


# To analyze errors guessing names from a csv
$ python3 errors.py --csv="files/names/all.csv" --api="genderguesser"
Gender Guesser with files/names/all.csv has:
+ The error code: 0.22564457518601835
+ The error code without na: 0.026539047204698716
+ The na coded: 0.20453365634192766
+ The error gender bias: 0.0026103980857080703

# To deploy a graph about correlation between variables
$ python3 corr.py
$ python3 corr.py --csv="categorical"
$ python3 corr.py --csv="nocategorical"
# To create files from scripts. Example: the pickle models, or csv processed from original files.
$ python3 postinstall.py
# Experiments to determine features with weight (not finished)
$ python3 pca-components.py --csv="files/features_list.csv" # To determine number of components
$ python3 pca-features.py                                   # To understand the weight between variables for a target

# Counting surnames
$ python3 surname.py Gil --total=es
There are 140004 people using Gil in Spain

$ python3 surname.py Lenon --total=us
There are 837 people using Lenon in United States of America

# Measuring Ethnicity of surnames
$ python3 ethnicity.py Smith
In United States of America the percentages about the race of Smith surname is:
White: 73.35
Black: 22.22
Hispanic: 1.56
Asian Pacific Indian American: 0.40
American Indian and Alaska Native: 0.85
Various races: 1.63

Benchmarking

Market Study

	Gender API	gender-guesser	genderize.io	NameAPI	NamSor	damegender
Database size	431322102	45376	114541298	1428345	4407502834	57282
Regular data updates	yes	no	no	yes	yes	yes, developing
Handles unstructured full name strings	yes	no	no	yes	no	yes
Handles surnames	yes	no	no	yes	yes	yes
Handles non-Latin alphabets	partially	no	partially	yes	yes	no
Implicit geo-localization	yes	no	no	yes	yes	no
Exists locale	yes	yes	yes	yes	yes	yes
Assingment type	probilistic	binary	probabilistic	probabilistic	probabilistic	probabilistic
Free parameters	total_names, probability	gender	probability, count	confidence	scale	total_names, count
Prediction	no	no	no	no	no	yes
Free license	no	yes	no	no	no	yes
API	yes	no	yes	yes	yes	future
free requests limited	yes (200)	unlimited	yes	yes	yes	unlimited

(Checked: 2019/06/27)

Accuracy

Name	Accuracy	Precision	F1score	Recall
Genderapi	0.9687686966482124	0.9717050018254838	0.9637877964874163	1.0
Genderize	0.926775	0.9761303240374678	0.9655113956503119	1.0
Namsor	0.8672551055728626	0.9730097087378641	0.9236866359447006	1.0
Nameapi	0.8301886792452831	0.97420272191753	0.9054181612233341	1.0
Gender Guesser	0.7743554248139817	0.9848151408450704	0.8715900233826968	1.0

(Checked: 2019/10 until 2019/12)

These accuracies has been measured thinking in Lucía Santamaría and Helena Mihaljevic dataset as base of truth.

Accuracy (Damegender ML)

Name	Accuracy	Precision	F1score	Recall
SVC	0.879	0.972	0.972	1.0
Random Forest	0.862	0.902	0.902	1.0
NLTK (Bayes)	0.862	0.902	0.902	1.0
MultinomialNB	0.782	0.791	0.791	1.0
Tree	0.764	0.821	0.796	1.0
SGD	0.709	0.943	0.815	1.0
GaussianNB	0.709	0.968	0.887	1.0
BernoulliNB	0.699	0.965	0.816	1.0
MLP	0.677	0.819	0.755	1.0

In Damegender we are using the next datasets:

INE.es (Spain)
USA
United Kingdom
Uruguay

We hope better results with more languages.

Machine Learning Algorithms in DameGender These results are experimental, we are improving the choosing of features.

Confusion Matrix

GenderApi

… male female undefined

male 3589 155 67

female 211 1734 23
Genderguesser

… male female undefided

male 3326 139 346

female 78 1686 204
Genderize

… male female undefined

male 3157 242 412

female 75 1742 151
Namsor

… male female undefined

male 3325 139 346

female 78 1686 204
Nameapi

… male female undefined

male 2627 674 507

female 667 1061 240
Dame Gender

… male female undefined

male 3033 778 0

female 276 1692 0

In this version of Dame Gender, we are not considering decide names as undefined.

Errors with files/names/all.csv has:

API	error code	error code without na	na coded	error gender bias
Genderize	0.0727	0.053	0.02	-0.008
Damegender	0.2547594323295258	0.2547594323295258	0.0	-0.04949809622706819
GenderApi	0.16666666666666666	0.16666666666666666	0.0	-0.16666666666666666
Gender Guesser	0.2255105572862582	0.026962383126766687	0.20404984423676012	0.0030441400304414
Namsor	0.16666666666666666	0.16666666666666666	0.0	0.16666666666666666
Nameapi	0.361	0.267	0.129	0.001

Performance

These performance metrics requires and csv json downloaded ################### Damegender!! Gender list: [1, 1, 1, 1, 1, 0] Guess list: [1, 1, 1, 1, 1, 0] Damegender accuracy: 1.0

real 0m1.270s user 0m0.876s sys 0m0.416s ################### Genderize!! Gender list: [1, 1, 1, 1, 1, 0] Guess list: [1, 1, 1, 1, 1, 0] Genderize accuracy: 1.0

real 0m0.811s user 0m0.776s sys 0m0.312s ################### Genderapi!! Gender list: [1, 1, 1, 1, 1, 0] Guess list: [1, 1, 1, 1, 1, 0] Genderapi accuracy: 1.0

real 0m0.763s user 0m0.744s sys 0m0.232s ################### Namsor!! Gender list: [1, 1, 1, 1, 1, 0] Guess list: [1, 1, 1, 1, 1, 0] Namsor accuracy: 1.0

real 0m0.811s user 0m0.776s sys 0m0.356s ################### Nameapi!! Gender list: [1, 1, 1, 1, 1, 0] Guess list: [1, 1, 1, 1, 1, 0] Nameapi accuracy: 1.0

real 0m0.832s user 0m0.816s sys 0m0.336s A confusion matrix C is such that Ci,j is equal to the number of observations known to be in group i but predicted to be in group j. If the classifier is nice, the diagonal is high because there are true positives Damegender confusion matrix:

[[ 5, 0, 0 ] [ 0, 1, 0 ]]

real 0m0.812s user 0m0.784s sys 0m0.300s Damegender with files/names/partial.csv has:

The error code: 0.10526315789473684
The error code without na: 0.10526315789473684
The na coded: 0.0
The error gender bias: 0.0

real 0m9.099s user 0m9.008s sys 0m0.412s

Statistics for damegender

Some theory could be useful to understand some commands

Measuring success and fails

To guess the sex, we have an true idea (example: female) and we obtain a result with a method (example: using an api, querying a dataset or with a machine learning model). The guessed result could be male, female or perhaps unknown. Remember some definitions about results about this matter:

True positive is find a value guessed as true if the value in the data source is positive.

True negative is find a value guessed as true if the the value in the data source is negative.

False positive is find a value guessed as false if the the value in the data source is positive.

False negative is find a value guessed as false if the the value in the data source is negative.

So, we can find a vocabulary for measure true, false, success and errors. We can make a summary in the gender name context about mathematical concepts:

Precision is about true positives divided by true positives plus false positives

(femalefemale + malemale ) /
(femalefemale + malemale + femalemale)

Recall is about true positives divided by true positives plus false negatives.

(femalefemale + malemale ) /
(femalefemale + malemale + malefemale + femaleundefined + maleundefined)

Accuray is about true positives divided by all.

(femalefemale + malemale ) /
(femalefemale + malemale + malefemale + femalemale + femaleundefined + maleundefined)

The F1 score is the harmonic mean of precision and recall taking both metrics into account in the following equation:

2 * (
(precision * recall) /
(precision + recall))

In Damengender, we are using accuracy.py to apply these concepts. Take a look to practice:

$ python3 accuracy.py --api="damegender" --measure="f1score" --csv="files/names/partialnoundefined.csv"
$ python3 accuracy.py --api="damegender" --measure="recall" --csv="files/names/partialnoundefined.csv"
$ python3 accuracy.py --api="damegender" --measure="precision" --csv="files/names/partialnoundefined.csv"
$ python3 accuracy.py --api="damegender" --measure="accuracy" --csv="files/names/partialnoundefined.csv"

$ python3 accuracy.py --api="genderguesser" --measure="f1score" --csv="files/names/partialnoundefined.csv"
$ python3 accuracy.py --api="genderguesser" --measure="recall" --csv="files/names/partialnoundefined.csv"
$ python3 accuracy.py --api="genderguesser" --measure="precision" --csv="files/names/partialnoundefined.csv"
$ python3 accuracy.py --api="genderguesser" --measure="accuracy" --csv="files/names/partialnoundefined.csv"

Error coded is about the true is different than the guessed:

(femalemale + malefemale + maleundefined + femaleundefined) /
(malemale + femalemale + malefemale +
femalefemale + maleundefined + femaleundefined)

Error coded without na is about the true is different than the guessed, but without undefined results.

(maleundefined + femaleundefined) /
(malemale + femalemale + malefemale +
femalefemale + maleundefined + femaleundefined)

Error gender bias is to understand if the error is bigger guessing males than females or viceversa.

(malefemale - femalemale) /
(malemale + femalemale + malefemale + femalefemale)

The weighted error is about the true is different than the guessed, but giving a weight to the guessed as undefined.

(femalemale + malefemale +
+ w * (maleundefined + femaleundefined)) /
(malemale + femalemale + malefemale + femalefemale +
+ w * (maleundefined + femaleundefined))

In Damengeder, we have coded errors.py to implement the different definitions in diffrent apis.

The confusion matrix creates a matrix about the true and the guess. If you have this confusion matrix:

[[ 2, 0, 0]
 [ 0, 5, 0]]

It means, I have 2 females true and I've guessed 2 females and I've 5 males true and I've guessed 5 males. I don't have errors in my classifier.

[[ 2  1  0]
[ 2 14  0]

It means, I have 2 females true and I've guessed 2 females and I've 14 males true and I've guessed 14 males. 1 female was considered male, 2 males was considered female.

In Damegender, we have coded confusion.py to implement this concept with the different apis.

PCA

Concepts

The dispersion measures between 1 variable, for instance, variance, standard deviation, …

If you have 2 variables, you can write a formula so similar to variance.

If you have 3 variables or more, you can write a covariance matrix.

In essence, an eigenvector v of a linear transformation T is a non-zero vector that, when T is applied to it, does not change direction. Applying T to the eigenvector only scales the eigenvector by the scalar value λ, called an eigenvalue.

A feature vector is constructed taking the eigenvectors that you want to keep from the list of eigenvectors.

The new dataset take the transpose of the vector and multiply it on the left of the original data set, transposed.

FinalData = RowFeatureVector x RowDataAdjust

We can choose PCA using the covariance method as opposed to the correlation method.

The covariance method has the next steps:

Organize the data set
Calculate the empirical mean
Calculate the deviations from the mean
Find the covariance matrix
Find the eigenvectors and eigenvalues of the covariance matrix
Rearrange the eigenvectors and eigenvalues
Compute the cumulative energy content for each eigenvector
Select a subset of the eigenvectors as basis vectors
Project the z-scores of the data onto the new basis

The correlation method has the next steps:

Compute the correlation matrix
Solve for the correlation roots of R (product of eigenvalues)
Compute the first column of the V matrix
Compute the remaining columns of the V matrix
Compute the L^(1/2) matrix
Compute the communality
Diagonal elements report how much of the variability is explained
Compute the coefficient matrix
Compute the principal factors

Choosing components

We can choose components with:

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--csv')
args = parser.parse_args()

#filepath = 'files/features_list.csv' #your path here
data = np.genfromtxt(args.csv, delimiter=',', dtype='float64')

scaler = MinMaxScaler(feature_range=[0, 1])
data_rescaled = scaler.fit_transform(data[1:, 0:8])

#Fitting the PCA algorithm with our Data
pca = PCA().fit(data_rescaled)
#Plotting the Cumulative Summation of the Explained Variance
plt.figure()
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Variance (%)') #for each component
plt.title('Dataset Explained Variance')
plt.show()

Taking a look to the image. We can choose 6 components.

Load Dataset

We choose the file all.csv to generate features and a list to determine gender (male or female)

from pprint import pprint
import pandas as pd
import matplotlib.pyplot as plt
from app.dame_sexmachine import DameSexmachine
from app.dame_gender import Gender

## LOAD DATASET
g = Gender()
g.features_list2csv(categorical="both", path="files/names/all.csv")
features = "files/features_list.csv"

print("STEP1: N COMPONENTS + 1 TARGET")

x = pd.read_csv(features)
print(x.columns)

y = g.dataset2genderlist(dataset="files/names/all.csv")
print(y)

Standarize the data

print("STEP2: STANDARIZE THE DATA")
from sklearn.preprocessing import StandardScaler
# Standardizing the features
x = StandardScaler().fit_transform(x)

Pca Projection to N Dimensions

Finally, we create the pca transform with 6 dimensions and we add the target component.

from sklearn.decomposition import PCA
pca = PCA(n_components=6)
principalComponents = pca.fit_transform(x)
print("STEP3: PCA PROJECTION")
pprint(principalComponents)
principalDf = pd.DataFrame(data = principalComponents, columns = ['principal component 1', 'principal component 2', 'principal component 3', 'principal component 4', 'principal component 5', 'principal component 6'])

target = pd.DataFrame(data = y, columns = ['target component'])

print(principalDf.join(target))

Analyze components to determine gender in names

first\\_letter	last\\_letter	last\\_letter\\_a	first\\_letter\\_vocal	last\\_letter\\_vocal	last\\_letter\\_consonant	target component
-0.2080025204	-0.3208958517	0.2352509625	0.2113242731	0.6095269139	-0.6095269139	-0.1035071139
-0.6037951881	0.5174873789	-0.4252467151	0.4278794455	0.0388287435	-0.0388287435	-0.0265942125
0.1049343046	0.1158117877	-0.2867605971	-0.3473950734	0.0901034539	-0.0901034539	-0.8697264971
0.2026467275	0.3142402839	0.630802294	0.5325769702	-0.1291229841	0.1291229841	-0.3811720011

In this analysis, we can observe 4 components.

The first component is about if the last letter is vocal or consonant. If the last letter is vocal we can find a male and if the last letter is a consonant we can find a male.

The second component is about the first letter. The last letter is determining females and the first letter is determining males.

The third component is not giving relevant information.

The fourth component is giving the last_letter_a and the first_letter_vocal is for females.

Speeches, Seminars, Expressions of Support

Beautiful Snakes

Dame Music

Listen music …

License

Copyright (C) 2019 David Arroyo Menendez Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in GNU Free Documentation License.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.6.0rc4 pre-release

Oct 8, 2023

0.6.0rc3 pre-release

Oct 7, 2023

0.6.0rc1 pre-release

Oct 5, 2023

0.5.7

Sep 28, 2023

0.5.6.post3

Sep 13, 2023

0.5.6rc3 pre-release

Sep 13, 2023

0.5.5.post3

Aug 27, 2023

0.5.5rc1 pre-release

Aug 25, 2023

0.5.4.post2

Aug 20, 2023

0.5.4.post1

Aug 20, 2023

0.5.4rc5 pre-release

Aug 20, 2023

0.5.4rc4 pre-release

Aug 20, 2023

0.5.4rc3 pre-release

Aug 19, 2023

0.5.4rc2 pre-release

Aug 19, 2023

0.5.3

May 11, 2023

0.5.3rc3 pre-release

May 11, 2023

0.5.3rc2 pre-release

May 11, 2023

0.5.2.post2

Jan 24, 2023

0.5.2.post1

Jan 24, 2023

0.5.2

Jan 24, 2023

0.5.2rc1 pre-release

Jan 23, 2023

0.5.1.post2

Dec 29, 2022

0.5.1rc1 pre-release

Dec 29, 2022

0.5.0

Nov 24, 2022

0.5.0rc2 pre-release

Nov 24, 2022

0.4.8

Oct 29, 2022

0.4.7.post6

Oct 17, 2022

0.4.7.post5

Sep 30, 2022

0.4.7.post4

Sep 30, 2022

0.4.7.post3

Sep 29, 2022

0.4.7.post2

Sep 29, 2022

0.4.7.post1

Sep 25, 2022

0.4.7

Sep 25, 2022

0.4.6.post1

Jul 25, 2022

0.4.5.post24

Apr 12, 2022

0.4.5.post20

Apr 12, 2022

0.4.5.post19

Apr 11, 2022

0.4.5.post18

Apr 10, 2022

0.4.5.post15

Apr 7, 2022

0.4.5.post7

Apr 5, 2022

0.4.5.post6

Apr 5, 2022

0.4.5

Apr 2, 2022

0.4.4

Mar 14, 2022

0.4.4rc2 pre-release

Mar 14, 2022

0.4.3.post4

Mar 6, 2022

0.4.3.post3

Mar 6, 2022

0.4.3.post2

Mar 6, 2022

0.4.3.post1

Mar 5, 2022

0.4.3rc2 pre-release

Mar 5, 2022

0.4.3rc1 pre-release

Mar 5, 2022

0.4.2

Feb 28, 2022

0.4.2rc3 pre-release

Feb 28, 2022

0.4.2rc2 pre-release

Feb 28, 2022

0.4.2rc1 pre-release

Feb 28, 2022

0.4.1

Feb 9, 2022

0.4.0.post2

Dec 9, 2021

0.4.0rc8 pre-release

Dec 9, 2021

0.4.0rc7 pre-release

Dec 9, 2021

0.4.0rc6 pre-release

Dec 9, 2021

0.4.0rc5 pre-release

Dec 9, 2021

0.3.8

Oct 8, 2021

0.3.8rc5 pre-release

Oct 8, 2021

0.3.8rc4 pre-release

Oct 8, 2021

0.3.7.post2

Oct 4, 2021

0.3.7.post1

Oct 4, 2021

0.3.7

Oct 4, 2021

0.3.6

Sep 30, 2021

0.3.6rc2 pre-release

Sep 30, 2021

0.3.5.post1

Jul 19, 2021

0.3.5

Jul 17, 2021

0.3.5rc6 pre-release

Jul 17, 2021

0.3.5rc5 pre-release

Jul 17, 2021

0.3.5rc3 pre-release

Jul 17, 2021

0.3.4

May 10, 2021

0.3.4rc1 pre-release

May 10, 2021

0.3.3.post4

May 4, 2021

0.3.3.post3

Apr 23, 2021

0.3.3.post2

Apr 21, 2021

0.3.3.post1

Apr 14, 2021

0.3.3rc9 pre-release

Apr 14, 2021

0.3.2.post6

Dec 23, 2020

0.3.2.post5

Dec 22, 2020

0.3.2.post4

Dec 19, 2020

0.3.2rc1 pre-release

Dec 16, 2020

0.3.1.post4

Nov 14, 2020

0.3.1.post3

Nov 10, 2020

0.3.1.post1

Nov 10, 2020

0.3 yanked

Sep 20, 2020

Reason this release was yanked:

it needs more testing

0.2.11.post8

Jul 3, 2020

0.2.11.post3

Jun 30, 2020

0.2.11

Jun 29, 2020

0.2.10

Jun 16, 2020

0.2.9

Jun 2, 2020

0.2.9rc6 pre-release

Jun 2, 2020

0.2.8

May 21, 2020

0.2.8rc9 pre-release

May 21, 2020

0.2.8rc8 pre-release

May 12, 2020

0.2.8rc7 pre-release

May 12, 2020

0.2.8rc6 pre-release

May 11, 2020

0.2.8rc3 pre-release

May 7, 2020

0.2.8rc2 pre-release

May 2, 2020

0.2.8rc1 pre-release

Apr 29, 2020

0.2.7

Apr 26, 2020

0.2.7rc4 pre-release

Apr 26, 2020

This version

0.2.7rc3 pre-release

Apr 26, 2020

0.2.6

Apr 5, 2020

0.2.6rc3 pre-release

Apr 6, 2020

0.2.6rc2 pre-release

Apr 6, 2020

0.2.6rc1 pre-release

Apr 6, 2020

0.2.5

Mar 26, 2020

0.2.4

Feb 17, 2020

0.2.3

Feb 7, 2020

0.2.1

Jan 16, 2020

0.1.53

Jan 16, 2020

0.1.49

Jan 5, 2020

0.1.45

Nov 28, 2019

0.1.41

Nov 27, 2019

0.1.37

Nov 26, 2019

0.1.36

Nov 6, 2019

0.1.35

Nov 5, 2019

0.1.34

Oct 23, 2019

0.1.30

Sep 21, 2019

0.1.25

Sep 20, 2019

0.1.24

Sep 20, 2019

0.1.23

Sep 18, 2019

0.1.22

Sep 18, 2019

0.1.21

Sep 18, 2019

0.1.20

Sep 18, 2019

0.1.17

Aug 27, 2019

0.1.15

Aug 27, 2019

0.1.9

Jul 5, 2019

0.0.95

Jul 5, 2019

0.0.94

Jul 5, 2019

0.0.93

Jul 5, 2019

0.0.92

Jul 5, 2019

0.0.91

Jul 5, 2019

0.0.89

Jul 5, 2019

0.0.74

Jun 3, 2019

0.0.62

Jun 2, 2019

0.0.61

Jun 1, 2019

0.0.60

May 15, 2019

0.0.59

May 13, 2019

0.0.52

May 7, 2019

0.0.50

May 7, 2019

0.0.47

May 5, 2019

0.0.36

Mar 21, 2019

0.0.35

Mar 20, 2019

0.0.0.1

Mar 20, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

damegender-0.2.7rc3.tar.gz (50.8 MB view hashes)

Uploaded Apr 26, 2020 Source

Hashes for damegender-0.2.7rc3.tar.gz

Hashes for damegender-0.2.7rc3.tar.gz
Algorithm	Hash digest
SHA256	`d9393cc3eb48a4aee05abcf9dc8c11c270cd3417d38d92a0ce33a12c10cc11fa`
MD5	`7b75eb4a008273aa381a478ee3cbace5`
BLAKE2b-256	`2a02600d7e6cfd9eaec8feb0999025f7187624fe9d88f3455db95d27dd08516b`

…	male	female	undefided
male	3326	139	346
female	78	1686	204

…	male	female	undefined
male	2627	674	507
female	667	1061	240

damegender 0.2.7rc3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Table of Contents

Logo

Name

Why?

Tell me about DAMe Gender on Youtube

Install

Docker Image

Installing Software

Possible Debian/Ubuntu dependencies

From sources

With python package

Obtaining an api key

Configuring nltk

Check test

All unit tests

Using Docker image

Single unit test

Using Docker image

Tests from commands

Execute program

Benchmarking

Market Study

Accuracy

Accuracy (Damegender ML)

Confusion Matrix

Errors with files/names/all.csv has:

Performance

Statistics for damegender

Measuring success and fails

PCA

Concepts

Choosing components

Load Dataset

Standarize the data

Pca Projection to N Dimensions

Analyze components to determine gender in names

Speeches, Seminars, Expressions of Support

Beautiful Snakes

Dame Music

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution