Auxiliary functions to clean pandas data frames
Project description
Pywrangle
Library for Python data wrangling to streamline string cleaning, identifying missing data, and tracking dataframe changes. Available on PyPI here
Install
- Python 3.6+
- numpy
- pandas
To install pywrangle, use pip:
pip install pywrangle
Import
Per convention with Python Analysis modules, import pywrangle as follows:
>>> import pywrangle as pw
String cleaning
def clean_str_columns(df: object, col_strcase_tuple: tuple) -> df:
Master function to clean string columns using col_strcase_tuple key.
col_strcase_tuple is a tuple of tuples representing the column names to be cleaned and an ordinal number for the pandas str cleaning method to use. Ordinal case control structure to determine case: 0 : lower_case 1 : title_case 2 : upper_case
df_winereviews = pd.read_csv("../input/wine-reviews/winemag-data_first150k.csv")
col_strcase_tuple = (
("country", 2),
("description", 0)
("province", 1),
)
df_winereviews = clean_str_columns( df_winereviews, col_strcase_tuple)
column name: str.clean_method
country upper
description lower
province title
Missing Data
print_nulls_per_col(df) -> None:
Calculates number of null values in each column and prints result.
Dataframe changes
The dataframe change functions record_df_info
and print_df_changes
are used in conjunction.
>>> old_df = pw.df_info(df)
>>> ... # some change to df
>>> pw.print_df_changes(df, old_df)
record_df_info(df, _name: str = "before") -> None:
Records information about the dataframe.
Information includes: - name (state of the dict, before or after) - number of rows - number of columns - size of df
recorded dataframe information is passed to compare_dfs() to check differences between dataframes.
print_df_changes( df, dict_recorded_info: dict, show_col_names: bool = False ) -> None:
Prints differences between dataframe and previously recorded information.
History
version = "0.2.40"
- refactored code for clarity
- added display info to print_df_changes
version = "0.2.1"
- Created init file for function imports
- Documentation on importing pywrangle
- Added numpy as required package.
- Changed package requirements to greater than or equal to.
version = "0.0.1"
- Init
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pywrangle-0.2.52.tar.gz
.
File metadata
- Download URL: pywrangle-0.2.52.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/49.5.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27f95762e9e9861166f958532d5db90d0f2889c26d8a75e3b0704e7132d3951f |
|
MD5 | 59dab0daa611f9781f4d6784f639a2b3 |
|
BLAKE2b-256 | 0314d1124a592fe5505f582c2226b59c721ea395f17439212761da38a7c321dd |