Library for Standardizing names from a Pandas dataframe
Project description
Similar Names
Library for Standardizing names from a Pandas dataframe
Description
Similar Names is basically a package for names manipulation. That is, if you have a Pandas dataframe with multiple names written in different ways (e.g.: John Doe, John E. Doe and John Edson Doe), the "closeMatches" function will look for all similar names on that column and then add two columns: "Close Matches" (list of all close matches) and "StandardName" (shortest name of the list).
Instalation
Similar Names can be installed directly through pip
pip install similarnames
How to use?
If you have a pandas dataframe with the names that you want to standardize, or look for close matches, simply execute the following command.
'''
Input (df): df and the name of the column with the names to check
| Names | ... |
|----------------|-----|
| John Doe | |
| John Edson Doe | |
| John E. Doe | |
| John Edson D. | |
'''
from similarnames import closeMatches
df_standard = closeMatches(df, 'Names')
'''
Output (df_standard)
| Names | ... | CloseMatches | StandardName |
|----------------|-----|----------------------------------------------------------------|--------------|
| John Doe | | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe |
| John Edson Doe | | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe |
| John E. Doe | | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe |
| John Edson D. | | ['John Doe', 'John E. Doe', 'John Edson Doe', 'John Edson D.'] | John Doe |
'''
In case you have multiple names in a single row, the "explode" is automatically done for you. So, just provide the "sep" parameter to identify where we should use to split those names. Note: If you have an "and" (e.g.: Jon and Jane), it will be automatically replaced by the "sep" parameter before the split.
'''
Input (df): df and the name of the column with the names to check
| Names | Other columns | ... |
|----------------------------------------------|-------------------------|-----|
| John Doe, Jane Doe | Two names (sep = ',') | |
| John E. Doe and Michael Johnson | Two names (without sep) | |
| Jane A. Doe, Michael M. Johnson and John Doe | Three names (sep = ',') | |
'''
from similarnames import closeMatches
df_standard = closeMatches(df, 'Names', sep = ',')
'''
Output (df_standard)
| Names | Other columns | ... | CloseMatches | StandardName |
|--------------------|-------------------------|-----|-------------------------------------------|-----------------|
| John Doe | Two names (sep = ',') | | ['John Doe', 'John E. Doe'] | John Doe |
| Jane Doe | Two names (sep = ',') | | ['Jane Doe', 'Jane A. Doe'] | Jane Doe |
| John E. Doe | Two names (without sep) | | ['John Doe', 'John E. Doe'] | John Doe |
| Michael Johnson | Two names (without sep) | | ['Michael Johnson', 'Michael M. Johnson'] | Michael Johnson |
| Jane A. Doe | Three names (sep = ',') | | ['Jane Doe', 'Jane A. Doe'] | Jane Doe |
| Michael M. Johnson | Three names (sep = ',') | | ['Michael Johnson', 'Michael M. Johnson'] | Michael Johnson |
| John Doe | Three names (sep = ',') | | ['John Doe', 'John E. Doe'] | John Doe |
'''
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file similarnames-0.1.3.tar.gz
.
File metadata
- Download URL: similarnames-0.1.3.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.8.8 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a005973bb65a03e9c660d5d71ea731c15fb1cc1751da97e3463afc1e9bc9380 |
|
MD5 | 079b6438ca6802ffd3f762e1c76b2419 |
|
BLAKE2b-256 | 6befaf95296b2cde6a525a0278bdc22927768a319d1b129244ddf91849b6c3ff |
File details
Details for the file similarnames-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: similarnames-0.1.3-py3-none-any.whl
- Upload date:
- Size: 3.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.14 CPython/3.8.8 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9366a080a1a7bc6b0dbf1daba18e1f03ba662b3dd1e25a93241a53cf02966ec5 |
|
MD5 | b27b3ee25c9d1f91a6c8d12198382c2b |
|
BLAKE2b-256 | 9c5b347b977fe08e72d2be13db92d67fe36c11839ceef0445ac9fe730a012075 |