Data access and analysis of baby names statistics

Project description

babe

Note that the first time you import name, you need to have access to the Internet, and it will take a few seconds (depending on bandwidth) to download the required data.

But this data is automatically saved in a local file so things are faster the next time around.

To install:

pip install babe

Then in a python console or notebook...

from babe import UsNames

d = UsNames()

Intro to the data

The fundamental data provides a popularity score (number of babies recorded) associated to a (state, gender, name, year) tuple (that has data -- for names of babies born in the US between 1910 and 2019).

d.data

	state	gender	year	name	popularity	name_g
0	AK	F	1910	Mary	14	Mary_F
1	AK	F	1910	Annie	12	Annie_F
2	AK	F	1910	Anna	10	Anna_F
3	AK	F	1910	Margaret	8	Margaret_F
4	AK	F	1910	Helen	7	Helen_F
...	...	...	...	...	...	...
28277	WY	M	2019	Theo	5	Theo_M
28278	WY	M	2019	Tristan	5	Tristan_M
28279	WY	M	2019	Vincent	5	Vincent_M
28280	WY	M	2019	Warren	5	Warren_M
28281	WY	M	2019	Waylon	5	Waylon_M

6122890 rows × 6 columns

print(f"{len(d.names)} unique names")

31862 unique names

But some names can be used for both genders, so most of the internals will use name_g, the name concatenated with the gender (_F or _M):

print(f"{len(d.name_gs)} unique names_g (gendered names)")

34952 unique names_g (gendered names)

You can use resolve_name_g to get the name_g corresponding to a name as long as the name isn't used for more than one gender.

d.resolve_name_g('Cora')

'Cora_F'

try:
    d.resolve_name_g('Vanessa')
except AssertionError as e:
    print(e)

The Vanessa can be used for both genders. Specify Vanessa_F or Vanessa_M

by_state data

In some cases, it's more convenient to have a view indexed by (state, name_g, year). The by_state attribute provides that.

d.by_state

state  name_g      year
AK     Mary_F      1910    14
       Annie_F     1910    12
       Anna_F      1910    10
       Margaret_F  1910     8
       Helen_F     1910     7
                           ..
WY     Theo_M      2019     5
       Tristan_M   2019     5
       Vincent_M   2019     5
       Warren_M    2019     5
       Waylon_M    2019     5
Name: popularity, Length: 6122890, dtype: int64

This allows one to do things like getting the data for a given state only:

d.by_state['CA']

name_g      year
Mary_F      1910    295
Helen_F     1910    239
Dorothy_F   1910    220
Margaret_F  1910    163
Frances_F   1910    134
                   ... 
Zayvion_M   2019      5
Zeek_M      2019      5
Zhaire_M    2019      5
Zian_M      2019      5
Ziyad_M     2019      5
Name: popularity, Length: 387781, dtype: int64

... within a state, getting the 'by year popularity' for a given name:

d.by_state['CA']['Cora_F']  # or d.by_state['CA', 'Cora_F']

year
1911      8
1912      9
1913     15
1914     15
1915     17
       ... 
2015    269
2016    244
2017    284
2018    282
2019    256
Name: popularity, Length: 109, dtype: int64

... if you wanted to get the data for a given name (really name_g), for all states, you can do it using "slicing".

For example, if you're wondering how many little boys were called "Vanessa", and more specifically, when and where?...

d.by_state[:, 'Vanessa_M']

state  year
AZ     1988     8
CA     1980     7
       1981     5
       1982    20
       1983    19
       1984    14
       1985    12
       1986    13
       1987    13
       1988    26
       1989    17
       1990    16
       1991    18
       1992    17
       1993    17
       1994    10
       1995     9
       1996    10
       1997    11
       1998     7
DC     1989    11
NY     1982     5
       1983     9
       1986     6
       1988     6
       1989     6
TX     1981     5
       1982     7
       1983    12
       1984     9
       1985    10
       1986     8
       1987     9
       1988     8
       1989     5
       1990     6
       1991     5
       1992     5
       1994     5
Name: popularity, dtype: int64

national data

A national aggregation is available through the national attribute

d.national

name_g      year
Aaban_M     2013     6
            2014     6
Aadam_M     2019     6
Aadan_M     2008    12
            2009     6
                    ..
Zyriah_F    2013     7
            2014     6
            2016     5
Zyron_M     2015     5
Zyshonne_M  1998     5
Name: popularity, Length: 633239, dtype: int64

The interface is as with the by_state attribute, but without the state specification.

d.national.loc['Vanessa_F']

year
1935       5
1947      24
1948      32
1949      16
1950      41
        ... 
2015    1687
2016    1633
2017    1486
2018    1345
2019    1188
Name: popularity, Length: 74, dtype: int64

Plotting stuff

d.plot_popularity('Cora');

png

d.plot_popularity('Cora', 'GA');

png

d.plot_popularity(['Cora', 'Vanessa_F']);

png

d.plot_popularity('Cora', ['CA', 'GA']);

png

d.plot_popularity(['Cora', 'Vanessa_F'], ['CA', 'GA']);

png

Misc

gender-ambiguous names

We'll call the "femininity" of a name be the proportion of times it was used (all states, all time) to name a girl, and the "masculinity" of a name be defined accordingly.

d.femininity_of_name.iloc[12000:12010]

Lemmie      0.161290
Kashmere    0.161290
Clary       0.162162
Sung        0.162393
Kyrie       0.163527
Cedar       0.163686
Masyn       0.163895
Naveen      0.165605
Chai        0.166667
Atlee       0.167382
dtype: float64

d.femininity_of_name.plot(figsize=(17, 5), ylabel='femininity');

png

d.masculinity_of_name.iloc[19000:19010]

Berkley     0.108889
Dasani      0.110092
Sharone     0.111111
Ifeoluwa    0.111111
Rama        0.111111
Scout       0.111486
Brownie     0.111732
Lashon      0.113158
Indigo      0.113364
Argie       0.113636
dtype: float64

d.masculinity_of_name.plot(figsize=(17, 5), ylabel='masculinity');

png

The (gender-)"ambiguity" of a name can thus be defined as twice the minimum of it's femininity and masculinity.

By defining the ambiguity thusly, we have a score that is between 0 and 1. It is maximal (1) when an equal proportion of boys and girls were named with the name. It is minimal (0) when only one gender was named with it.

Note that this score is raw (or "un-smoothed"). It's computed with the raw counts, so the extreme scores will usually be for names with very low counts.

d.ambiguity_of_name

Munachiso    1.0
Addis        1.0
Deshone      1.0
Gal          1.0
Rajdeep      1.0
            ... 
Sharelle     0.0
Analy        0.0
Sharayah     0.0
Sharaya      0.0
Aaban        0.0
Length: 31862, dtype: float64

t = d.ambiguity_of_name
print(f"There are {len(t[t > 0])} (gender-)ambiguous names")

There are 3090 (gender-)ambiguous names

t = d.ambiguity_of_name
t[t > 0].plot(figsize=(17, 5), ylabel='gender-ambiguity');

png

t = list(d.ambiguous_names)
print(f"{len(t)} (gender-)ambiguous names:")
print(*t[:9], '...', sep=', ')

3090 (gender-)ambiguous names:
Nolie, Tyrese, Linn, Savannah, Bryn, Rei, Abby, Shilo, Tracy, ...

Project details

Release history Release notifications | RSS feed

0.0.8

Jul 24, 2026

This version

0.0.7

Nov 11, 2020

0.0.6

Nov 10, 2020

0.0.5

Nov 10, 2020

0.0.4

Nov 10, 2020

0.0.3

Nov 10, 2020

0.0.2

Nov 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babe-0.0.7.tar.gz (8.8 kB view details)

Uploaded Nov 11, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

babe-0.0.7-py3-none-any.whl (6.9 kB view details)

Uploaded Nov 11, 2020 Python 3

File details

Details for the file babe-0.0.7.tar.gz.

File metadata

Download URL: babe-0.0.7.tar.gz
Upload date: Nov 11, 2020
Size: 8.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for babe-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`746bf5184236d682de6f0a2b9b26d5dfc1d44a031eb12f30b6fc2451976b0ded`
MD5	`792b3efb56dffc967bb97238182483cf`
BLAKE2b-256	`1bf17b5c9e20222fa33a29d471b710f6641582e1a0d16a2ca16e44914511b640`

See more details on using hashes here.

File details

Details for the file babe-0.0.7-py3-none-any.whl.

File metadata

Download URL: babe-0.0.7-py3-none-any.whl
Upload date: Nov 11, 2020
Size: 6.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for babe-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`660b6f1647012e517e1cfdfe362d52949a451fd8ba220d620513f912a04e2c77`
MD5	`2d0cb064ebc8fc29310cae6e8fdc1f55`
BLAKE2b-256	`8544a4f6454ad1a91de0757232c182d1ac0314f6b41827b25ed461dc02c84bb5`

See more details on using hashes here.

babe 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

babe

Intro to the data

by_state data

national data

Plotting stuff

Misc

gender-ambiguous names

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes