This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.
Project description
Functionality Guide
This module provides a handful of functions to simplify the typical data processing operations and simplifying data verification procedures.
Dependencies
numpy 1.17.1
pandas 0.25.1
Methods
-
df_preview(df, n_samples)
Description
Creates a nice summary table of your DataFrame.
Parameters
-
df
: pandas.DataFrameThe DataFrame you want to create a preview for.
-
n_samples
: int, optional (default = 2)Number of unique values from each column to be displayed.
Returns
- pandas.DataFrame containing the summary information about the passed DataFrame.
-
-
rename_col(df, old_name, new_name)
Description
Renames the specified column.
Parameters
-
df
: pandas.DataFrameThe DataFrame you want to create a preview for.
-
old_name
: strName of existing
df
column to be renamed. -
new_name
: strName which will replace the
old_name
column name.
Returns
- pandas.DataFrame with the renamed column.
-
-
columns_mismatch(col_1, col_2)
Description
Extracts values that are present in
col_1
, but not incol_2
.Parameters
-
col_1
: pandas.SeriesThe Series you want to subtract values from.
-
col_2
: pandas.SeriesThe Series which is subtracted from
col_1
.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.
Returns
- Set with values which
col_1
contains andcol_2
does not contain.
-
-
df_difference(df_1, df_2)
Description
Extracts rows that are present in
df_1
, but not indf_2
.Note:
df_1
anddf_2
can have different column names, but number of columns should match.Parameters
-
df_1
: pandas.DataFrameThe DataFrame you want to subtract values from.
-
df_2
: pandas.DataFrameThe DataFrame which is subtracted from
df_1
.
Note: The word "subtract" is used not in arithmetical sense, but in a set difference sense.
Returns
- pandas.DataFrame with rows which
df_1
contains anddf_2
does not contain.
-
-
verify_dates_integity(df, date_col)
Description
Checks whether there are any missing dates between earliest and latest dates from
df[date_col]
Parameters
-
df
: pandas.DataFrameThe DataFrame which after selecting values from
date_col
will be verified for integrity -
date_col
: strName of
df
column that will be verified for integrity
-
-
duplicate(df, how, n_times)
Description
Extends the specified DataFrame by repeating its rows.
Parameters
-
df
: pandas.DataFrameThe DataFrame which rows you want to repeat
-
how
: strStrategy for repeating. Should be either 'whole' (then [1,2] -> [1,2,1,2]) or 'element_wise' (then [1,2] -> [1,1,2,2])
-
n_times
: intNumber of repetitions of each row
Returns
- Extended pandas.DataFrame with repeated rows
-
-
groupby_to_list(df, by_cols, col_to_list)
Description
Extracts values of
col_to_list
column that correspond to the same values inby_cols
column(s) and put them to list.Parameters
-
df
: pandas.DataFrameThe DataFrame which you want to use
-
by_cols
: list of strColumn names that will be used as keys in
df
-
col_to_list
: strColumn name which values will be put to lists
Returns
- pandas.DataFrame with columns [
by_cols
,col_to_list
] so that all the values incol_to_list
column are lists.
-
-
chunkenize(data_to_split, num_chunks, df_indices, copy)
Description
Splits the
data_to_split
into list withnum_chunks
chunks. Can be helpful when preparing data for parallel processing.Parameters
-
data_to_split
: pandas.DataFrame or listThe DataFrame which you want to split in chunks
-
num_chunks
: intNumber of chunks that your data will be split in
-
df_indices
: list of str, optional (default = [])This can be used when
data_to_split
is pandas.DataFrame. These column will be used as DataFrame index before splitting and will be reset afterwards. -
copy
: bool, optional (default = True)Determines whether you want to perform splitting on a copy of
data_to_split
.
Returns
- List of
num_chunks
chunks that have same type asdata_to_split
.
-
-
filter_df(df, col_name, l_bound, r_bound, inclusive)
Description
Filters the
df
DataFramecol_name
column so that it contains only records that corresponds todf
[col_name
] values in the range betweenl_bound
andr_bound
.Parameters
-
df
: pandas.DataFrameThe DataFrame which column
col_name
you want to filter -
col_name
: strColumn name from
df
which values you want to filterdf
on -
l_bound
: same type as values ofdf
[col_name
]Left bound of the filtered values range. Can be omitted if
r_bound
is specified -
r_bound
: same type as values ofdf
[col_name
]Right bound of the filtered values range. Can be omitted if
l_bound
is specified -
inclusive
: bool, optional (default = True)Determines whether you want range to be inclusive (True) or exclusive (False)
Returns
- Filtered pandas.DataFrame
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.