This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.
Project description
This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.
Dependencies
numpy 1.17.1
pandas 0.25.1
Installation Guide
pip install helper-funcs
Usage
Import class "HF" from module "helper_funcs":
from helper_funcs import HF
And then call any of the methods described below.
Methods
-
df_preview(df, n_samples)
Description
Creates a nice summary table of your DataFrame.
Parameters
-
df
: pandas.DataFrameThe DataFrame you want to create a preview for.
-
n_samples
: int, optional (default = 2)Number of unique values from each column to be displayed.
Returns
- pandas.DataFrame containing the summary information about the passed DataFrame.
-
-
rename_col(df, old_name, new_name)
Description
Renames the specified column.
Parameters
-
df
: pandas.DataFrameThe DataFrame you want to create a preview for.
-
old_name
: strName of existing
df
column to be renamed. -
new_name
: strName which will replace the
old_name
column name.
Returns
- pandas.DataFrame with the renamed column.
-
-
columns_mismatch(col_1, col_2)
Description
Extracts values that are present in
col_1
, but not incol_2
.Parameters
-
col_1
: pandas.SeriesThe Series you want to subtract values from.
-
col_2
: pandas.SeriesThe Series which is subtracted from
col_1
.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.
Returns
- Set with values which
col_1
contains andcol_2
does not contain.
-
-
df_difference(df_1, df_2)
Description
Extracts rows that are present in
df_1
, but not indf_2
.Note:
df_1
anddf_2
can have different column names, but number of columns should match.Parameters
-
df_1
: pandas.DataFrameThe DataFrame you want to subtract values from.
-
df_2
: pandas.DataFrameThe DataFrame which is subtracted from
df_1
.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.
Returns
- pandas.DataFrame with rows which
df_1
contains anddf_2
does not contain.
-
-
verify_dates_integity(df, date_col)
Description
Checks whether there are any missing dates between earliest and latest dates from
df[date_col]
Parameters
-
df
: pandas.DataFrameThe DataFrame which after selecting values from
date_col
will be verified for integrity -
date_col
: strName of
df
column that will be verified for integrity
-
-
duplicate(df, how, n_times)
Description
Extends the specified DataFrame by repeating its rows.
Parameters
-
df
: pandas.DataFrameThe DataFrame which rows you want to repeat
-
how
: strStrategy for repeating. Should be either 'whole' (then [1,2] -> [1,2,1,2]) or 'element_wise' (then [1,2] -> [1,1,2,2])
-
n_times
: intNumber of repetitions of each row
Returns
- Extended pandas.DataFrame with repeated rows
-
-
groupby_to_list(df, by_cols, col_to_list)
Description
Extracts values of
col_to_list
column that correspond to the same values inby_cols
column(s) and put them to list.Parameters
-
df
: pandas.DataFrameThe DataFrame which you want to use
-
by_cols
: list of strColumn names that will be used as keys in
df
-
col_to_list
: strColumn name which values will be put to lists
Returns
- pandas.DataFrame with columns [
by_cols
,col_to_list
] so that all the values incol_to_list
column are lists.
-
-
chunkenize(data_to_split, num_chunks, df_indices, copy)
Description
Splits the
data_to_split
into list withnum_chunks
chunks. Can be helpful when preparing data for parallel processing.Parameters
-
data_to_split
: pandas.DataFrame or listThe DataFrame which you want to split in chunks
-
num_chunks
: intNumber of chunks that your data will be split in
-
df_indices
: list of str, optional (default = [])This can be used when
data_to_split
is pandas.DataFrame. These column will be used as DataFrame index before splitting and will be reset afterwards. -
copy
: bool, optional (default = True)Determines whether you want to perform splitting on a copy of
data_to_split
.
Returns
- List of
num_chunks
chunks that have same type asdata_to_split
.
-
-
filter_df(df, col_name, l_bound, r_bound, inclusive)
Description
Filters the
df
DataFramecol_name
column so that it contains only records that corresponds todf
[col_name
] values in the range betweenl_bound
andr_bound
.Parameters
-
df
: pandas.DataFrameThe DataFrame which column
col_name
you want to filter -
col_name
: strColumn name from
df
which values you want to filterdf
on -
l_bound
: same type as values ofdf
[col_name
]Left bound of the filtered values range. Can be omitted if
r_bound
is specified -
r_bound
: same type as values ofdf
[col_name
]Right bound of the filtered values range. Can be omitted if
l_bound
is specified -
inclusive
: bool, optional (default = True)Determines whether you want range to be inclusive (True) or exclusive (False)
Returns
- Filtered pandas.DataFrame
-
-
prepare_str_cols(df, make_uppercase)
Description
Strips leading and trailing spaces in str columns of
df
and makes those values to either upper-case or lower-case.Parameters
-
df
: pandas.DataFrameThe DataFrame you want to prepare str columns for.
-
make_uppercase
: boolDetermines whether you want str values to be upper-cased or lower-cased.
Returns
- pandas.DataFrame where all strings are either upper-cased or lower-cased with all leading and trailing spaces removed.
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file helper_funcs-0.1.35.tar.gz
.
File metadata
- Download URL: helper_funcs-0.1.35.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
97dd1fb9d2189e30eb248947f0b47c6def7159706c21ed1d8bbc925ff23f1c44
|
|
MD5 |
0b4e21ad861731706994faee99d0e016
|
|
BLAKE2b-256 |
d2ddca0024e52c183bb43d91c1fafef30323bd35d46026624fc57c0611b2fa2c
|