This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.
Project description
This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.
Dependencies
numpy 1.17.1pandas 0.25.1
Installation Guide
pip install helper-funcs
Usage
Import class "HF" from module "helper_funcs":
from helper_funcs import HF
And then call any of the methods described below.
Methods
-
df_preview(df, n_samples)Description
Creates a nice summary table of your DataFrame.
Parameters
-
df: pandas.DataFrameThe DataFrame you want to create a preview for.
-
n_samples: int, optional (default = 2)Number of unique values from each column to be displayed.
Returns
- pandas.DataFrame containing the summary information about the passed DataFrame.
-
-
rename_col(df, old_name, new_name)Description
Renames the specified column.
Parameters
-
df: pandas.DataFrameThe DataFrame you want to create a preview for.
-
old_name: strName of existing
dfcolumn to be renamed. -
new_name: strName which will replace the
old_namecolumn name.
Returns
- pandas.DataFrame with the renamed column.
-
-
columns_mismatch(col_1, col_2)Description
Extracts values that are present in
col_1, but not incol_2.Parameters
-
col_1: pandas.SeriesThe Series you want to subtract values from.
-
col_2: pandas.SeriesThe Series which is subtracted from
col_1.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.
Returns
- Set with values which
col_1contains andcol_2does not contain.
-
-
df_difference(df_1, df_2)Description
Extracts rows that are present in
df_1, but not indf_2.Note:
df_1anddf_2can have different column names, but number of columns should match.Parameters
-
df_1: pandas.DataFrameThe DataFrame you want to subtract values from.
-
df_2: pandas.DataFrameThe DataFrame which is subtracted from
df_1.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.
Returns
- pandas.DataFrame with rows which
df_1contains anddf_2does not contain.
-
-
verify_dates_integity(df, date_col)Description
Checks whether there are any missing dates between earliest and latest dates from
df[date_col]Parameters
-
df: pandas.DataFrameThe DataFrame which after selecting values from
date_colwill be verified for integrity -
date_col: strName of
dfcolumn that will be verified for integrity
-
-
duplicate(df, how, n_times)Description
Extends the specified DataFrame by repeating its rows.
Parameters
-
df: pandas.DataFrameThe DataFrame which rows you want to repeat
-
how: strStrategy for repeating. Should be either 'whole' (then [1,2] -> [1,2,1,2]) or 'element_wise' (then [1,2] -> [1,1,2,2])
-
n_times: intNumber of repetitions of each row
Returns
- Extended pandas.DataFrame with repeated rows
-
-
groupby_to_list(df, by_cols, col_to_list)Description
Extracts values of
col_to_listcolumn that correspond to the same values inby_colscolumn(s) and put them to list.Parameters
-
df: pandas.DataFrameThe DataFrame which you want to use
-
by_cols: list of strColumn names that will be used as keys in
df -
col_to_list: strColumn name which values will be put to lists
Returns
- pandas.DataFrame with columns [
by_cols,col_to_list] so that all the values incol_to_listcolumn are lists.
-
-
chunkenize(data_to_split, num_chunks, df_indices, copy)Description
Splits the
data_to_splitinto list withnum_chunkschunks. Can be helpful when preparing data for parallel processing.Parameters
-
data_to_split: pandas.DataFrame or listThe DataFrame which you want to split in chunks
-
num_chunks: intNumber of chunks that your data will be split in
-
df_indices: list of str, optional (default = [])This can be used when
data_to_splitis pandas.DataFrame. These column will be used as DataFrame index before splitting and will be reset afterwards. -
copy: bool, optional (default = True)Determines whether you want to perform splitting on a copy of
data_to_split.
Returns
- List of
num_chunkschunks that have same type asdata_to_split.
-
-
filter_df(df, col_name, l_bound, r_bound, inclusive)Description
Filters the
dfDataFramecol_namecolumn so that it contains only records that corresponds todf[col_name] values in the range betweenl_boundandr_bound.Parameters
-
df: pandas.DataFrameThe DataFrame which column
col_nameyou want to filter -
col_name: strColumn name from
dfwhich values you want to filterdfon -
l_bound: same type as values ofdf[col_name]Left bound of the filtered values range. Can be omitted if
r_boundis specified -
r_bound: same type as values ofdf[col_name]Right bound of the filtered values range. Can be omitted if
l_boundis specified -
inclusive: bool, optional (default = True)Determines whether you want range to be inclusive (True) or exclusive (False)
Returns
- Filtered pandas.DataFrame
-
-
prepare_str_cols(df, make_uppercase)Description
Strips leading and trailing spaces in str columns of
dfand makes those values to either upper-case or lower-case.Parameters
-
df: pandas.DataFrameThe DataFrame you want to prepare str columns for.
-
make_uppercase: boolDetermines whether you want str values to be upper-cased or lower-cased.
Returns
- pandas.DataFrame where all strings are either upper-cased or lower-cased with all leading and trailing spaces removed.
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file helper_funcs-0.1.35.tar.gz.
File metadata
- Download URL: helper_funcs-0.1.35.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97dd1fb9d2189e30eb248947f0b47c6def7159706c21ed1d8bbc925ff23f1c44
|
|
| MD5 |
0b4e21ad861731706994faee99d0e016
|
|
| BLAKE2b-256 |
d2ddca0024e52c183bb43d91c1fafef30323bd35d46026624fc57c0611b2fa2c
|