Skip to main content

Library for Standardizing names from a Pandas dataframe

Project description

Similar Names

Library for Standardizing names from a Pandas dataframe

Description

Similar Names is basically a package for names manipulation. That is, if you have a Pandas dataframe with multiple names written in different ways (e.g.: John Doe, John E. Doe and John Edson Doe), the "closeMatches" function will look for all similar names on that column and then add two columns: "Close Matches" (list of all close matches) and "StandardName" (shortest name of the list).

Instalation

Similar Names can be installed directly through pip pip install similarnames

How to use?

If you have a pandas dataframe with the names that you want to standardize, or look for close matches, simply execute the following command.

'''
Input (df): df and the name of the column with the names to check

| Names          | ... |
|----------------|-----|
| John Doe       |     |
| John Edson Doe |     |
| John E. Doe    |     |
| John Edson D.  |     |
'''
from similarnames import closeMatches

df_standard = closeMatches(df, 'Names')

'''
Output (df_standard)

| Names          | ... | CloseMatches                                                   | StandardName |
|----------------|-----|----------------------------------------------------------------|--------------|
| John Doe       |     | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe     |
| John Edson Doe |     | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe     |
| John E. Doe    |     | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe     |
| John Edson D.  |     | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe     |

'''

In case you have multiple names in a single row, the "explode" is automatically done for you. So, just provide the "sep" parameter to identify where we should use to split those names. Note: If you have an "and" (e.g.: Jon and Jane), it will be automatically replaced by the "sep" parameter before the split.

'''
Input (df): df and the name of the column with the names to check

| Names                                        | Other columns           | ... |
|----------------------------------------------|-------------------------|-----|
| John Doe, Jane Doe                           | Two names (sep = ',')   |     |
| John E. Doe and Michael Johnson              | Two names (without sep) |     |
| Jane A. Doe, Michael M. Johnson and John Doe | Three names (sep = ',') |     |
'''
from similarnames import closeMatches

df_standard = closeMatches(df, 'Names', sep = ',')

'''
Output (df_standard)

| Names              | Other columns           | ... | CloseMatches                              | StandardName    |
|--------------------|-------------------------|-----|-------------------------------------------|-----------------|
| John Doe           | Two names (sep = ',')   |     | ['John Doe', 'John E. Doe']               | John Doe        |
| Jane Doe           | Two names (sep = ',')   |     | ['Jane Doe', 'Jane A. Doe']               | Jane Doe        |
| John E. Doe        | Two names (without sep) |     | ['John Doe', 'John E. Doe']               | John Doe        |
| Michael Johnson    | Two names (without sep) |     | ['Michael Johnson', 'Michael M. Johnson'] | Michael Johnson |
| Jane A. Doe        | Three names (sep = ',') |     | ['Jane Doe', 'Jane A. Doe']               | Jane Doe        |
| Michael M. Johnson | Three names (sep = ',') |     | ['Michael Johnson', 'Michael M. Johnson'] | Michael Johnson |
| John Doe           | Three names (sep = ',') |     | ['John Doe', 'John E. Doe']               | John Doe        |

'''

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

similarnames-0.1.3.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

similarnames-0.1.3-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file similarnames-0.1.3.tar.gz.

File metadata

  • Download URL: similarnames-0.1.3.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.8 Windows/10

File hashes

Hashes for similarnames-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9a005973bb65a03e9c660d5d71ea731c15fb1cc1751da97e3463afc1e9bc9380
MD5 079b6438ca6802ffd3f762e1c76b2419
BLAKE2b-256 6befaf95296b2cde6a525a0278bdc22927768a319d1b129244ddf91849b6c3ff

See more details on using hashes here.

File details

Details for the file similarnames-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: similarnames-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.8 Windows/10

File hashes

Hashes for similarnames-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9366a080a1a7bc6b0dbf1daba18e1f03ba662b3dd1e25a93241a53cf02966ec5
MD5 b27b3ee25c9d1f91a6c8d12198382c2b
BLAKE2b-256 9c5b347b977fe08e72d2be13db92d67fe36c11839ceef0445ac9fe730a012075

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page