Auxiliary functions to clean pandas data frames
Project description
Pywrangle
About
PyWrangle is an open-source Python library for data wrangling. Wikipedia defines data wrangling as follows:
is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics
Functions
PyWrangle currently supports:
- cleaning strings
- tracking dataframe changes
- identifying data entry errors
Documentation & Distribution
Documentation is available here
Distribution is available here
Install
Requirements
- Python >= 3.6
- numpy >= 1.14.4
- pandas >= 1.0.3
- fuzzywuzzy >= 0.18.0
- python-levenshtein >= 0.12.0
- metaphone >= 0.6
Pip Install
To install pywrangle, use pip:
pip install pywrangle
Import
Per convention with Python libraries for data science, import pywrangle as follows:
>>> import pywrangle as pw
Contributing
Like all developers, I love open source. Please reference the contributing guidelines here
History
Version = "0.3.03
- Removed walrus operator for Pre-3.8 compatability. Now Python 3.6+ compatable.
Version = "0.3.0"
- Removed identify missing data from library -- too much overlap with the missingno library.
- Added identify_errors() function. Uses levenshtein's distance & double metaphone string matching algorithms to identify potential data entry errors in string columns.
- Refactored code into different sub libraries
- Placed documentation on ReadTheDocs.
version = "0.2.40"
- refactored code for clarity
- added display info to print_df_changes
version = "0.2.1"
- Created init file for function imports
- Documentation on importing pywrangle
- Added numpy as required package.
- Changed package requirements to greater than or equal to.
version = "0.0.1"
- Init
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.