Convenience functions.

These details have not been verified by PyPI

Project links

Homepage

Project description

roux

Convenience functions in Python.
Examples · Explore the API

Examples

Installation

pip install roux              # with basic dependencies  
pip install roux[all]         # with all the additional dependencies (recommended).

With additional dependencies as required:

pip install roux[viz]         # for visualizations e.g. seaborn etc.
pip install roux[data]        # for data operations e.g. reading excel files etc.
pip install roux[stat]        # for statistics e.g. statsmodels etc.
pip install roux[fast]        # for faster processing e.g. parallelization etc.
pip install roux[workflow]    # for workflow operations e.g. omegaconf etc.
pip install roux[interactive] # for interactive operations in jupyter notebook e.g. watermark, icecream etc.

Command-line usage

🗺️ Read configuration.
roux read-config path/to/file

🗺️ Read metadata.
roux read-metadata path/to/file

📁 Find the latest and the oldest file in a list.
roux read-ps list_of_paths

💾 Backup a directory with a timestamp (ISO).
roux backup path/to/directory

⭐ Remove *'s from a jupyter notebook'.
roux removestar path/to/notebook

ℹ️ Available command line tools and their usage.
roux --help

How to cite?

Using BibTeX:

@software{Dandage_roux,
  title   = {roux: Streamlined and Versatile Data Processing Toolkit},
  author  = {Dandage, Rohan},
  year    = {2023},
  url     = {https://zenodo.org/doi/10.5281/zenodo.2682670},
  version = {v0.1.0},
  note    = {The URL is a DOI link to the permanent archive of the software.},
}

DOI link: , or
Using citation information from CITATION.CFF file.

Future directions, for which contributions are welcome

Addition of visualization function as attributes to rd dataframes.
Refactoring of the workflow functions.

Similar projects

API

`module` `roux.global_imports`

For importing commonly used functions at the development phase.

Usage: in interactive sessions (e.g. in jupyter notebooks) to facilitate faster code development.

Note: Post-development, to remove *s from the code, use removestar (pip install removestar).

removestar file

Global Variables

FONTSIZE
PAD

`module` `roux.lib.df`

For processing individual pandas DataFrames/Series

`function` `get_name`

get_name(df1: DataFrame, cols: list = None, coff: float = 2, out=None)

Gets the name of the dataframe.

Especially useful within groupby+pandarellel context.

Parameters:

df1 (DataFrame): input dataframe.
cols (list): list groupby columns.
coff (int): cutoff of unique values to infer the name.
out (str): format of the output (list|not).

Returns:

name (tuple|str|list): name of the dataframe.

`function` `get_groupby_columns`

get_groupby_columns(df_)

Get the columns supplied to groupby.

Parameters:

df_ (DataFrame): input dataframe.

Returns:

columns (list): list of columns.

`function` `get_constants`

get_constants(df1)

Get the columns with a single unique value.

Parameters:

df1 (DataFrame): input dataframe.

Returns:

columns (list): list of columns.

`function` `drop_unnamedcol`

drop_unnamedcol(df)

Deletes the columns with "Unnamed" prefix.

Parameters:

df (DataFrame): input dataframe.

Returns:

df (DataFrame): output dataframe.

`function` `drop_unnamedcol`

drop_unnamedcol(df)

Deletes the columns with "Unnamed" prefix.

Parameters:

df (DataFrame): input dataframe.

Returns:

df (DataFrame): output dataframe.

`function` `drop_levelcol`

drop_levelcol(df)

Deletes the potentially temporary columns names with "level" prefix.

Parameters:

df (DataFrame): input dataframe.

Returns:

df (DataFrame): output dataframe.

`function` `drop_constants`

drop_constants(df)

Deletes columns with a single unique value.

Parameters:

df (DataFrame): input dataframe.

Returns:

df (DataFrame): output dataframe.

`function` `dropby_patterns`

dropby_patterns(df1, patterns=None, strict=False, test=False, verbose=True)

Deletes columns containing substrings i.e. patterns.

Parameters:

df1 (DataFrame): input dataframe.
patterns (list): list of substrings.
test (bool): verbose.

Returns:

df1 (DataFrame): output dataframe.

`function` `flatten_columns`

flatten_columns(df: DataFrame, sep: str = ' ', **kws) → DataFrame

Multi-index columns to single-level.

Parameters:

df (DataFrame): input dataframe.
sep (str): separator within the joined tuples (' ').

Returns:

df (DataFrame): output dataframe.

Keyword Arguments:

kws (dict): parameters provided to coltuples2str function.

`function` `lower_columns`

lower_columns(df)

Column names of the dataframe to lower-case letters.

Parameters:

df (DataFrame): input dataframe.

Returns:

df (DataFrame): output dataframe.

`function` `renameby_replace`

renameby_replace(
    df: DataFrame,
    replaces: dict,
    ignore: bool = True,
    **kws
) → DataFrame

Rename columns by replacing sub-strings.

Parameters:

df (DataFrame): input dataframe.
replaces (dict|list): from->to format or list containing substrings to remove.
ignore (bool): if True, not validate the successful replacements.

Returns:

df (DataFrame): output dataframe.

Keyword Arguments:

kws (dict): parameters provided to replacemany function.

`function` `clean_columns`

clean_columns(df: DataFrame) → DataFrame

Standardise columns.

Steps: 1. Strip flanking white-spaces. 2. Lower-case letters.

Parameters:

df (DataFrame): input dataframe.

Returns:

df (DataFrame): output dataframe.

`function` `clean`

clean(
    df: DataFrame,
    cols: list = [],
    drop_constants: bool = False,
    drop_unnamed: bool = True,
    verb: bool = False
) → DataFrame

Deletes potentially temporary columns.

Steps: 1. Strip flanking white-spaces. 2. Lower-case letters.

Parameters:

df (DataFrame): input dataframe.
drop_constants (bool): whether to delete the columns with a single unique value.
drop_unnamed (bool): whether to delete the columns with 'Unnamed' prefix.
verb (bool): verbose.

Returns:

df (DataFrame): output dataframe.

`function` `compress`

compress(df1, coff_categories=20, test=False)

Compress the dataframe by converting columns containing strings/objects to categorical.

Parameters:

df1 (DataFrame): input dataframe.
coff_categories (int): if the number of unique values are less than cutoff the it will be converted to categories.
test (bool): verbose.

Returns:

df1 (DataFrame): output dataframe.

`function` `clean_compress`

clean_compress(df, kws_compress={}, **kws_clean)

clean and compress the dataframe.

Parameters:

df (DataFrame): input dataframe.
kws_compress (int): keyword arguments for the compress function.
test (bool): verbose.

Keyword Arguments:

kws_clean (dict): parameters provided to clean function.

Returns:

df1 (DataFrame): output dataframe.

`function` `check_na`

check_na(df, subset=None, out=True, perc=False, log=True)

Number of missing values in columns.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
out (bool): output, else not which can be applicable in chained operations.

Returns:

ds (Series): output stats.

`function` `validate_no_na`

validate_no_na(df, subset=None)

Validate no missing values in columns.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
perc (bool): output percentages.

Returns:

ds (Series): output stats.

`function` `assert_no_na`

assert_no_na(df, subset=None)

Assert that no missing values in columns.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
perc (bool): output percentages.

Returns:

ds (Series): output stats.

`function` `to_str`

to_str(data, log=False)

`function` `check_nunique`

check_nunique(
    df: DataFrame,
    subset: list = None,
    groupby: str = None,
    perc: bool = False,
    auto=False,
    out=True,
    log=True
) → Series

Number/percentage of unique values in columns.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
perc (bool): output percentages.

Returns:

ds (Series): output stats.

`function` `check_inflation`

check_inflation(df1, subset=None)

Occurances of values in columns.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.

Returns:

ds (Series): output stats.

`function` `check_dups`

check_dups(df, subset=None, perc=False, out=True)

Check duplicates.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
perc (bool): output percentages.

Returns:

ds (Series): output stats.

`function` `check_duplicated`

check_duplicated(df, **kws)

Check duplicates (alias of check_dups)

`function` `validate_no_dups`

validate_no_dups(df, subset=None)

Validate that no duplicates.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.

`function` `validate_no_duplicates`

validate_no_duplicates(df, subset=None)

Validate that no duplicates (alias of validate_no_dups)

`function` `assert_no_dups`

assert_no_dups(df, subset=None)

Assert that no duplicates

`function` `validate_dense`

validate_dense(
    df01: DataFrame,
    subset: list = None,
    duplicates: bool = True,
    na: bool = True,
    message=None
) → DataFrame

Validate no missing values and no duplicates in the dataframe.

Parameters:

df01 (DataFrame): input dataframe.
subset (list): list of columns.
duplicates (bool): whether to check duplicates.
na (bool): whether to check na.
message (str): error message

`function` `assert_dense`

assert_dense(
    df01: DataFrame,
    subset: list = None,
    duplicates: bool = True,
    na: bool = True,
    message=None
) → DataFrame

Alias of validate_dense.

Notes:

to be deprecated in future releases.

`function` `classify_mappings`

classify_mappings(df1: DataFrame, subset, clean: bool = False) → DataFrame

Classify mappings between items in two columns.

Parameters:

df1 (DataFrame): input dataframe.
col1 (str): column #1.
col2 (str): column #2.
clean (str): drop columns with the counts.

Returns:

(pd.DataFrame): output.

`function` `check_mappings`

check_mappings(df: DataFrame, subset: list = None, out=True) → DataFrame

Mapping between items in two columns.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
out (str): format of the output.

Returns:

ds (Series): output stats.

`function` `assert_1_1_mappings`

assert_1_1_mappings(df: DataFrame, subset: list = None) → DataFrame

Validate that the papping between items in two columns is 1:1.

Parameters:

df (DataFrame): input dataframe.
subset (list): list of columns.
out (str): format of the output.

`function` `get_mappings`

get_mappings(
    df1: DataFrame,
    subset=None,
    keep='all',
    clean=False,
    cols=None
) → DataFrame

Classify the mapapping between items in two columns.

Parameters:

df1 (DataFrame): input dataframe.
subset (list): list of columns.
keep (str): type of mapping (1:1|1:m|m:1).
clean (bool): whether remove temporary columns.
cols (list): alias of subset.

Returns:

df (DataFrame): output dataframe.

`function` `to_map_binary`

to_map_binary(df: DataFrame, colgroupby=None, colvalue=None) → DataFrame

Convert linear mappings to a binary map

Parameters:

df (DataFrame): input dataframe.
colgroupby (str): name of the column for groupby.
colvalue (str): name of the column containing values.

Returns:

df1 (DataFrame): output dataframe.

`function` `check_intersections`

check_intersections(
    df: DataFrame,
    colindex=None,
    colgroupby=None,
    plot=False,
    **kws_plot
) → DataFrame

Check intersections. Linear dataframe to is converted to a binary map and then to a series using groupby.

Parameters:

df (DataFrame): input dataframe.
colindex (str): name of the index column.
colgroupby (str): name of the groupby column.
plot (bool): plot or not.

Returns:

ds1 (Series): output Series.

Keyword Arguments:

kws_plot (dict): parameters provided to the plotting function.

`function` `get_totals`

get_totals(ds1)

Get totals from the output of check_intersections.

Parameters:

ds1 (Series): input Series.

Returns:

d (dict): output dictionary.

`function` `filter_rows`

filter_rows(
    df,
    d,
    sign='==',
    logic='and',
    drop_constants=False,
    test=False,
    verbose=True
)

Filter rows using a dictionary.

Parameters:

df (DataFrame): input dataframe.
d (dict): dictionary.
sign (str): condition within mappings ('==').
logic (str): condition between mappings ('and').
drop_constants (bool): to drop the columns with single unique value (False).
test (bool): testing (False).
verbose (bool): more verbose (True).

Returns:

df (DataFrame): output dataframe.

`function` `get_bools`

get_bools(df, cols, drop=False)

Columns to bools. One-hot-encoder (get_dummies).

Parameters:

df (DataFrame): input dataframe.
cols (list): columns to encode.
drop (bool): drop the cols (False).

Returns:

df (DataFrame): output dataframe.

`function` `agg_bools`

agg_bools(df1, cols)

Bools to columns. Reverse of one-hot encoder (get_dummies).

Parameters:

df1 (DataFrame): input dataframe.
cols (list): columns.

Returns:

ds (Series): output series.

`function` `melt_paired`

melt_paired(
    df: DataFrame,
    cols_index: list = None,
    suffixes: list = None,
    cols_value: list = None,
    clean: bool = False
) → DataFrame

Melt a paired dataframe.

Parameters:

df (DataFrame): input dataframe.
cols_index (list): paired index columns (None).
suffixes (list): paired suffixes (None).
cols_value (list): names of the columns containing the values (None).

Notes:

Partial melt melts selected columns cols_value.

Examples: Paired parameters: cols_value=['value1','value2'], suffixes=['gene1','gene2'],

`function` `get_chunks`

get_chunks(
    df1: DataFrame,
    colindex: str,
    colvalue: str,
    bins: int = None,
    value: str = 'right'
) → DataFrame

Get chunks of a dataframe.

Parameters:

df1 (DataFrame): input dataframe.
colindex (str): name of the index column.
colvalue (str): name of the column containing values [0-100]
bins (int): number of bins.
value (str): value to use as the name of the chunk ('right').

Returns:

ds (Series): output series.

`function` `sample_near_quantiles`

sample_near_quantiles(data: DataFrame, col: str, n: int, clean: bool = False)

Get rows with values closest to the quantiles.

`function` `get_group`

get_group(groups, i: int = None, verbose: bool = True) → DataFrame

Get a dataframe for a group out of the groupby object.

Parameters:

groups (object): groupby object.
i (int): index of the group. default None returns the largest group.
verbose (bool): verbose (True).

Returns:

df (DataFrame): output dataframe.

Notes:

Useful for testing groupby.

`function` `groupby_sample`

groupby_sample(
    df: DataFrame,
    groupby: list,
    i: int = None,
    **kws_get_group
) → DataFrame

Samples a group (similar to .sample)

Parameters:

df (pd.DataFrame): input dataframe.
groupby (list): columns to group by.
i (int): index of the group. default None returns the largest group.

Keyword arguments: keyword parameters provided to the get_group function

Returns: pd.DataFrame

`function` `groupby_agg_nested`

groupby_agg_nested(
    df1: DataFrame,
    groupby: list,
    subset: list,
    func: dict = None,
    cols_value: list = None,
    verbose: bool = False,
    **kws_agg
) → DataFrame

Aggregate serially from the lower level subsets to upper level ones.

Parameters:

df1 (pd.DataFrame): input dataframe.
groupby (list): groupby columns i.e. list of columns to be used as ids in the output.
subset (list): nested groups i.e. subsets.
func (dict): map betweek columns with value to aggregate and the function for aggregation.
cols_value (list): columns with value to aggregate, (optional).
verbose (bool): verbose.

Keyword arguments:

kws_agg : keyword arguments provided to pandas's .agg function.

Returns: output dataframe with the aggregated values.

`function` `groupby_filter_fast`

groupby_filter_fast(
    df1: DataFrame,
    col_groupby,
    fun_agg,
    expr,
    col_agg: str = 'temporary',
    **kws_query
) → DataFrame

Groupby and filter fast.

Parameters:

df1 (DataFrame): input dataframe.
by (str|list): column name/s to groupby with.
fun (object): function to filter with.
how (str): greater or less than coff (>|<).
coff (float): cut-off.

Returns:

df1 (DataFrame): output dataframe.

Todo: Deprecation if pandas.core.groupby.DataFrameGroupBy.filter is faster.

`function` `infer_index`

infer_index(
    data: DataFrame,
    cols_drop=[],
    include=<class 'object'>,
    exclude=None
) → list

Infer the index (id) of the table.

`function` `to_multiindex_columns`

to_multiindex_columns(df, suffixes, test=False)

Single level columns to multiindex.

Parameters:

df (DataFrame): input dataframe.
suffixes (list): list of suffixes.
test (bool): verbose (False).

Returns:

df (DataFrame): output dataframe.

`function` `to_ranges`

to_ranges(df1, colindex, colbool, sort=True)

Ranges from boolean columns.

Parameters:

df1 (DataFrame): input dataframe.
colindex (str): column containing index items.
colbool (str): column containing boolean values.
sort (bool): sort the dataframe (True).

Returns:

df1 (DataFrame): output dataframe.

TODO: compare with io_sets.bools2intervals.

`function` `to_boolean`

to_boolean(df1)

Boolean from ranges.

Parameters:

df1 (DataFrame): input dataframe.

Returns:

ds (Series): output series.

TODO: compare with io_sets.bools2intervals.

`function` `to_cat`

to_cat(ds1, cats, ordered=True)

To series containing categories.

Parameters:

ds1 (Series): input series.
cats (list): categories.
ordered (bool): if the categories are ordered (True).

Returns:

ds1 (Series): output series.

`function` `astype_cat`

astype_cat(df1: DataFrame, col: str, cats: list)

`function` `sort_valuesby_list`

sort_valuesby_list(
    df1: DataFrame,
    by: str,
    cats: list,
    by_more: list = [],
    **kws
)

Sort dataframe by custom order of items in a column.

Parameters:

df1 (DataFrame): input dataframe.
by (str): column.
cats (list): ordered list of items.

Keyword parameters:

kws (dict): parameters provided to sort_values.

Returns:

df (DataFrame): output dataframe.

`function` `agg_by_order`

agg_by_order(x, order)

Get first item in the order.

Parameters:

x (list): list.
order (list): desired order of the items.

Returns:

k: first item.

Notes:

Used for sorting strings. e.g. damaging > other non-conserving > other conserving

TODO: Convert categories to numbers and take min

`function` `agg_by_order_counts`

agg_by_order_counts(x, order)

Get the aggregated counts by order*.

Parameters:

x (list): list.
order (list): desired order of the items.

Returns:

df (DataFrame): output dataframe.

Examples: df=pd.DataFrame({'a1':['a','b','c','a','b','c','d'], 'b1':['a1','a1','a1','b1','b1','b1','b1'],}) df.groupby('b1').apply(lambda df : agg_by_order_counts(x=df['a1'], order=['b','c','a'], ))

`function` `groupby_sort_values`

groupby_sort_values(
    df: DataFrame,
    col_groupby: list,
    col_sortby: list,
    subset: list = None,
    col_subset: list = None,
    func: str = 'mean',
    ascending: bool = True
)

Sort groups.

Parameters:

df (DataFrame): input dataframe.
col_groupby (str|list): column/s to groupby with.
col_sortby (str|list): column/s to sort values with.
subset (list): columns (None).
col_subset (str): column containing the subset (None).
func (str): aggregate function, provided to numpy ('mean').
ascending (bool): sort values ascending (True).

Returns:

df (DataFrame): output dataframe.

`function` `groupby_sort_values`

groupby_sort_values(
    df: DataFrame,
    col_groupby: list,
    col_sortby: list,
    subset: list = None,
    col_subset: list = None,
    func: str = 'mean',
    ascending: bool = True
)

Sort groups.

Parameters:

df (DataFrame): input dataframe.
col_groupby (str|list): column/s to groupby with.
col_sortby (str|list): column/s to sort values with.
subset (list): columns (None).
col_subset (str): column containing the subset (None).
func (str): aggregate function, provided to numpy ('mean').
ascending (bool): sort values ascending (True).

Returns:

df (DataFrame): output dataframe.

`function` `swap_paired_cols`

swap_paired_cols(df_, suffixes=['gene1', 'gene2'])

Swap suffixes of paired columns.

Parameters:

df_ (DataFrame): input dataframe.
suffixes (list): suffixes.

Returns:

df (DataFrame): output dataframe.

`function` `sort_columns_by_values`

sort_columns_by_values(
    df: DataFrame,
    subset: list,
    suffixes: list = None,
    order: list = None,
    clean=False
) → DataFrame

Sort the values in columns in ascending order.

Parameters:

df (DataFrame): input dataframe.
subset (list): columns.
suffixes (list): suffixes.
order (list): ordered list.

Returns:

df (DataFrame): output dataframe.

Notes:

In the output dataframe, sorted means values are sorted because gene1>gene2.

`function` `make_ids`

make_ids(
    df: DataFrame,
    cols: list,
    ids_have_equal_length: bool,
    sep: str = '--',
    sort: bool = False
) → Series

Make ids by joining string ids in more than one columns.

Parameters:

df (DataFrame): input dataframe.
cols (list): columns.
ids_have_equal_length (bool): ids have equal length, if True faster processing.
sep (str): separator between the ids ('--').
sort (bool): sort the ids before joining (False).

Returns:

ds (Series): output series.

`function` `make_ids_sorted`

make_ids_sorted(
    df: DataFrame,
    cols: list,
    ids_have_equal_length: bool,
    sep: str = '--',
    sort: bool = False
) → Series

Make sorted ids by joining string ids in more than one columns.

Parameters:

df (DataFrame): input dataframe.
cols (list): columns.
ids_have_equal_length (bool): ids have equal length, if True faster processing.
sep (str): separator between the ids ('--').

Returns:

ds (Series): output series.

`function` `get_alt_id`

get_alt_id(s1: str, s2: str, sep: str = '--')

Get alternate/partner id from a paired id.

Parameters:

s1 (str): joined id.
s2 (str): query id.

Returns:

s (str): partner id.

`function` `split_ids`

split_ids(df1, col, sep='--', prefix=None)

Split joined ids to individual ones.

Parameters:

df1 (DataFrame): input dataframe.
col (str): column containing the joined ids.
sep (str): separator within the joined ids ('--').
prefix (str): prefix of the individual ids (None).

Return:

df1 (DataFrame): output dataframe.

`function` `dict2df`

dict2df(d, colkey='key', colvalue='value')

Dictionary to DataFrame.

Parameters:

d (dict): dictionary.
colkey (str): name of column containing the keys.
colvalue (str): name of column containing the values.

Returns:

df (DataFrame): output dataframe.

`function` `log_shape_change`

log_shape_change(d1, fun='')

Report the changes in the shapes of a DataFrame.

Parameters:

d1 (dic): dictionary containing the shapes.
fun (str): name of the function.

`function` `log_apply`

log_apply(
    df,
    fun,
    validate_equal_length=False,
    validate_equal_width=False,
    validate_equal_shape=False,
    validate_no_decrease_length=False,
    validate_no_decrease_width=False,
    validate_no_increase_length=False,
    validate_no_increase_width=False,
    *args,
    **kwargs
)

Report (log) the changes in the shapes of the dataframe before and after an operation/s.

Parameters:

df (DataFrame): input dataframe.
fun (object): function to apply on the dataframe.
validate_equal_length (bool): Validate that the number of rows i.e. length of the dataframe remains the same before and after the operation.
validate_equal_width (bool): Validate that the number of columns i.e. width of the dataframe remains the same before and after the operation.
validate_equal_shape (bool): Validate that the number of rows and columns i.e. shape of the dataframe remains the same before and after the operation.

Keyword parameters:

args (tuple): provided to fun.
kwargs (dict): provided to fun.

Returns:

df (DataFrame): output dataframe.

`class` `log`

Report (log) the changes in the shapes of the dataframe before and after an operation/s.

TODO: Create the attribures (attr) using strings e.g. setattr. import inspect fun=inspect.currentframe().f_code.co_name

`method` `init`

__init__(pandas_obj)

`method` `check_dups`

check_dups(**kws)

`method` `check_na`

check_na(**kws)

`method` `clean`

clean(**kws)

`method` `drop`

drop(**kws)

`method` `drop_duplicates`

drop_duplicates(**kws)

`method` `dropna`

dropna(**kws)

`method` `explode`

explode(**kws)

`method` `filter_`

filter_(**kws)

`method` `filter_rows`

filter_rows(**kws)

`method` `groupby`

groupby(**kws)

`method` `join`

join(**kws)

`method` `melt`

melt(**kws)

`method` `melt_paired`

melt_paired(**kws)

`method` `merge`

merge(**kws)

`method` `pivot`

pivot(**kws)

`method` `pivot_table`

pivot_table(**kws)

`method` `query`

query(**kws)

`method` `stack`

stack(**kws)

`method` `unstack`

unstack(**kws)

`module` `roux.lib.dfs`

For processing multiple pandas DataFrames/Series

`function` `filter_dfs`

filter_dfs(dfs: list, cols: list, how: str = 'inner') → DataFrame

Filter dataframes based items in the common columns.

Parameters:

dfs (list): list of dataframes.
cols (list): list of columns.
how (str): how to filter ('inner')

Returns

dfs (list): list of dataframes.

`function` `merge_with_many_columns`

merge_with_many_columns(
    df1: DataFrame,
    right: str,
    left_on: str,
    right_ons: list,
    right_id: str,
    how: str = 'inner',
    validate: str = '1:1',
    test: bool = False,
    verbose: bool = False,
    **kws_merge
) → DataFrame

Merge with many columns. For example, if ids in the left table can map to ids located in multiple columns of the right table.

Parameters:

df1 (pd.DataFrame): left table.
right (pd.DataFrame): right table.
left_on (str): column in the left table to merge on.
right_ons (list): columns in the right table to merge on.
right_id (str): column in the right dataframe with for example the ids to be merged.

Keyword parameters:

kws_merge: to be supplied to pandas.DataFrame.merge.

Returns: Merged table.

`function` `merge_paired`

merge_paired(
    df1: DataFrame,
    df2: DataFrame,
    left_ons: list,
    right_on: list,
    common: list = [],
    right_ons_common: list = [],
    how: str = 'inner',
    validates: list = ['1:1', '1:1'],
    suffixes: list = None,
    test: bool = False,
    verb: bool = True,
    **kws
) → DataFrame

Merge uppaired dataframes to a paired dataframe.

Parameters:

df1 (DataFrame): paired dataframe.
df2 (DataFrame): unpaired dataframe.
left_ons (list): columns of the df1 (suffixed).
right_on (str|list): column/s of the df2 (to be suffixed).
common (str|list): common column/s between df1 and df2 (not suffixed).
right_ons_common (str|list): common column/s between df2 to be used for merging (not to be suffixed).
how (str): method of merging ('inner').
validates (list): validate mappings for the 1st mapping between df1 and df2 and 2nd one between df1+df2 and df2 (['1:1','1:1']).
suffixes (list): suffixes to be used (None).
test (bool): testing (False).
verb (bool): verbose (True).

Keyword Parameters:

kws (dict): parameters provided to merge.

Returns:

df (DataFrame): output dataframe.

Examples:

Parameters: how='inner', left_ons=['gene id gene1','gene id gene2'], # suffixed common='sample id', # not suffixed right_on='gene id', # to be suffixed right_ons_common=[], # not to be suffixed

`function` `merge_dfs`

merge_dfs(dfs: list, **kws) → DataFrame

Merge dataframes from left to right.

Parameters:

dfs (list): list of dataframes.

Keyword Parameters:

kws (dict): parameters provided to merge.

Returns:

df (DataFrame): output dataframe.

Notes:

For example, reduce(lambda x, y: x.merge(y), [1, 2, 3, 4, 5]) merges ((((1.merge(2)).merge(3)).merge(4)).merge(5)).

`function` `compare_rows`

compare_rows(df1, df2, test=False, **kws)

`module` `roux.lib.dict`

For processing dictionaries.

`function` `head_dict`

head_dict(d, lines=5)

`function` `sort_dict`

sort_dict(d1, by=1, ascending=True)

Sort dictionary by values.

Parameters:

d1 (dict): input dictionary.
by (int): index of the value among the values.
ascending (bool): ascending order.

Returns:

d1 (dict): output dictionary.

`function` `merge_dicts`

merge_dicts(l: list) → dict

Merge dictionaries.

Parameters:

l (list): list containing the dictionaries.

Returns:

d (dict): output dictionary.

TODOs: 1. In python>=3.9, merged = d1 | d2?

`function` `merge_dicts_deep`

merge_dicts_deep(left: dict, right: dict) → dict

Merge nested dictionaries. Overwrites left with right.

Parameters:

left (dict): dictionary #1
right (dict): dictionary #2

TODOs: 1. In python>=3.9, merged = d1 | d2?

`function` `merge_dict_values`

merge_dict_values(l, test=False)

Merge dictionary values.

Parameters:

l (list): list containing the dictionaries.
test (bool): verbose.

Returns:

d (dict): output dictionary.

`function` `flip_dict`

flip_dict(d)

switch values with keys and vice versa.

Parameters:

d (dict): input dictionary.

Returns:

d (dict): output dictionary.

`module` `roux.lib.google`

Processing files form google-cloud services.

`function` `get_service`

get_service(service_name='drive', access_limit=True, client_config=None)

Creates a google service object.

:param service_name: name of the service e.g. drive :param access_limit: True is access limited else False :param client_config: custom client config ... :return: google service object

Ref: https://developers.google.com/drive/api/v3/about-auth

`function` `get_service`

get_service(service_name='drive', access_limit=True, client_config=None)

Creates a google service object.

:param service_name: name of the service e.g. drive :param access_limit: True is access limited else False :param client_config: custom client config ... :return: google service object

Ref: https://developers.google.com/drive/api/v3/about-auth

`function` `list_files_in_folder`

list_files_in_folder(service, folderid, filetype=None, fileext=None, test=False)

Lists files in a google drive folder.

:param service: service object e.g. drive :param folderid: folder id from google drive :param filetype: specify file type :param fileext: specify file extension :param test: True if verbose else False ... :return: list of files in the folder

`function` `get_file_id`

get_file_id(p)

`function` `download_file`

download_file(
    p=None,
    file_id=None,
    service=None,
    outd=None,
    outp=None,
    convert=False,
    force=False,
    test=False
)

Downloads a specified file.

:param service: google service object :param file_id: file id as on google drive :param filetypes: specify file type :param outp: path to the ouput file :param test: True if verbose else False

Ref: https://developers.google.com/drive/api/v3/ref-export-formats

`function` `upload_file`

upload_file(service, filep, folder_id, test=False)

Uploads a local file onto google drive.

:param service: google service object :param filep: path of the file :param folder_id: id of the folder on google drive where the file will be uploaded :param test: True is verbose else False ... :return: id of the uploaded file

`function` `upload_files`

upload_files(service, ps, folder_id, **kws)

`function` `download_drawings`

download_drawings(folderid, outd, service=None, test=False)

Download specific files: drawings

TODOs: 1. use download_file

`function` `get_comments`

get_comments(
    fileid,
    fields='comments/quotedFileContent/value,comments/content,comments/id',
    service=None
)

Get comments.

fields: comments/ kind: id: createdTime: modifiedTime: author: kind: displayName: photoLink: me: True htmlContent: content: deleted: quotedFileContent: mimeType: value: anchor: replies: []

`function` `search`

search(query, results=1, service=None, **kws_search)

Google search.

:param query: exact terms ... :return: dict

`function` `get_search_strings`

get_search_strings(text, num=5, test=False)

Google search.

:param text: string :param num: number of results :param test: True if verbose else False ... :return lines: list

`function` `get_metadata_of_paper`

get_metadata_of_paper(
    file_id,
    service_drive,
    service_search,
    metadata=None,
    force=False,
    test=False
)

Get the metadata of a pdf document.

`function` `share`

share(
    drive_service,
    content_id,
    share=False,
    unshare=False,
    user_permission=None,
    permissionId='anyoneWithLink'
)

:params user_permission: user_permission = { 'type': 'anyone', 'role': 'reader', 'email':'@' } Ref: https://developers.google.com/drive/api/v3/manage-sharing

`class` `slides`

`method` `create_image`

create_image(service, presentation_id, page_id, image_id)

image less than 1.5 Mb

`method` `get_page_ids`

get_page_ids(service, presentation_id)

`module` `roux.lib.io`

For input/output of data files.

`function` `read_zip`

read_zip(p: str, file_open: str = None, fun_read=None, test: bool = False)

Read the contents of a zip file.

Parameters:

p (str): path of the file.
file_open (str): path of file within the zip file to open.
fun_read (object): function to read the file.

Examples:

Setting fun_read parameter for reading tab-separated table from a zip file.

from io import StringIO ... fun_read=lambda x: pd.read_csv(io.StringIO(x.decode('utf-8')),sep=' ',header=None),

from io import BytesIO ... fun_read=lambda x: pd.read_table(BytesIO(x)),

`function` `to_zip_dir`

to_zip_dir(source, destination=None, fmt='zip')

Zip a folder. Ref: https://stackoverflow.com/a/50381250/3521099

`function` `to_zip`

to_zip(
    p: str,
    outp: str = None,
    func_rename=None,
    fmt: str = 'zip',
    test: bool = False
)

Compress a file/directory.

Parameters:

p (str): path to the file/directory.
outp (str): path to the output compressed file.
fmt (str): format of the compressed file.

Returns:

outp (str): path of the compressed file.

`function` `to_dir`

to_dir(
    paths: dict,
    output_dir_path: str,
    rename_basename=None,
    force=False,
    test=False
)

`function` `get_version`

get_version(suffix: str = '') → str

Get the time-based version string.

Parameters:

suffix (string): suffix.

Returns:

version (string): version.

`function` `to_version`

to_version(
    p: str,
    outd: str = None,
    test: bool = False,
    name: str = None,
    **kws: dict
) → str

Rename a file/directory to a version.

Parameters:

p (str): path.
outd (str): output directory.

Keyword parameters:

kws (dict): provided to get_version.

Returns:

version (string): version.

TODOs: 1. Use to_dir.

`function` `backup`

backup(
    p: str,
    outd: str = None,
    versioned: bool = False,
    suffix: str = '',
    zipped: bool = False,
    move_only: bool = False,
    test: bool = True,
    verbose: bool = False,
    no_test: bool = False
)

Backup a directory

Steps: 0. create version dir in outd 1. move ps to version (time) dir with common parents till the level of the version dir 2. zip or not

Parameters:

p (str): input path.
outd (str): output directory path.
versioned (bool): custom version for the backup (False).
suffix (str): custom suffix for the backup ('').
zipped (bool): whether to zip the backup (False).
test (bool): testing (True).
no_test (bool): no testing. Usage in command line (False).

TODOs: 1. Use to_dir. 2. Option to remove dirs find and move/zip "find -regex ./_." "find -regex ./test."

`function` `read_url`

read_url(url)

Read text from an URL.

Parameters:

url (str): URL link.

Returns:

s (string): text content of the URL.

`function` `download`

download(
    url: str,
    path: str = None,
    outd: str = None,
    force: bool = False,
    verbose: bool = True
) → str

Download a file.

Parameters:

url (str): URL.
path (str): custom output path (None)
outd (str): output directory ('data/database').
force (bool): overwrite output (False).
verbose (bool): verbose (True).

Returns:

path (str): output path (None)

`function` `read_text`

read_text(p)

Read a file. To be called by other functions

Args:

p (str): path.

Returns:

s (str): contents.

`function` `to_list`

to_list(l1, p)

Save list.

Parameters:

l1 (list): input list.
p (str): path.

Returns:

p (str): path.

`function` `read_list`

read_list(p)

Read the lines in the file.

Args:

p (str): path.

Returns:

l (list): list.

`function` `read_list`

read_list(p)

Read the lines in the file.

Args:

p (str): path.

Returns:

l (list): list.

`function` `is_dict`

is_dict(p)

`function` `read_dict`

read_dict(p, fmt: str = '', apply_on_keys=None, **kws) → dict

Read dictionary file.

Parameters:

p (str): path.
fmt (str): format of the file.

Keyword Arguments:

kws (d): parameters provided to reader function.

Returns:

d (dict): output dictionary.

`function` `to_dict`

to_dict(d, p, **kws)

Save dictionary file.

Parameters:

d (dict): input dictionary.
p (str): path.

Keyword Arguments:

kws (d): parameters provided to export function.

Returns:

p (str): path.

`function` `post_read_table`

post_read_table(
    df1: DataFrame,
    clean: bool,
    tables: list,
    verbose: bool = True,
    **kws_clean: dict
)

Post-reading a table.

Parameters:

df1 (DataFrame): input dataframe.
clean (bool): whether to apply clean function. tables ()
verbose (bool): verbose.

Keyword parameters:

kws_clean (dict): paramters provided to the clean function.

Returns:

df (DataFrame): output dataframe.

`function` `read_table`

read_table(
    p: str,
    ext: str = None,
    clean: bool = True,
    filterby_time=None,
    params: dict = {},
    kws_clean: dict = {},
    kws_cloud: dict = {},
    check_paths: bool = True,
    tables: int = 1,
    test: bool = False,
    verbose: bool = True,
    engine: str = 'fastparquet',
    **kws_read_tables: dict
)

Table/s reader.

Parameters:

 - <b>`p`</b> (str):  path of the file. It could be an input for `read_ps`, which would include strings with wildcards, list etc.  
 - <b>`ext`</b> (str):  extension of the file (default: None meaning infered from the path). 
 - <b>`clean=(default`</b>: True). filterby_time=None). 
 - <b>`check_paths`</b> (bool):  read files in the path column (default:True).  
 - <b>`test`</b> (bool):  testing (default:False). 
 - <b>`params`</b>:  parameters provided to the 'pd.read_csv' (default:{}). For example 
 - <b>`params['columns']`</b>:  columns to read. 
 - <b>`kws_clean`</b>:  parameters provided to 'rd.clean' (default:{}). 
 - <b>`kws_cloud`</b>:  parameters for reading files from google-drive (default:{}). 
 - <b>`tables`</b>:  how many tables to be read (default:1). 
 - <b>`verbose`</b>:  verbose (default:True).

Keyword parameters: - kws_read_tables (dict): parameters provided to read_tables function. For example: - to_col={colindex: replaces_index}

Returns:

 - <b>`df`</b> (DataFrame):  output dataframe.

Examples:

For reading specific columns only set params=dict(columns=list).
For reading many files, convert paths to a column with corresponding values:

to_col={colindex: replaces_index}

Reading a vcf file. p='*.vcf|vcf.gz' read_table(p, params_read_csv=dict( #compression='gzip', sep=' ',comment='#',header=None, names=replace_many(get_header(path,comment='#',lineno=-1),['#',' '],'').split(' ')) )

`function` `get_logp`

get_logp(ps: list) → str

Infer the path of the log file.

Parameters:

ps (list): list of paths.

Returns:

p (str): path of the output file.

`function` `apply_on_paths`

apply_on_paths(
    ps: list,
    func,
    replaces_outp: str = None,
    to_col: dict = None,
    replaces_index=None,
    drop_index: bool = True,
    colindex: str = 'path',
    filter_rows: dict = None,
    fast: bool = False,
    progress_bar: bool = True,
    params: dict = {},
    dbug: bool = False,
    test1: bool = False,
    verbose: bool = True,
    kws_read_table: dict = {},
    **kws: dict
)

Apply a function on list of files.

Parameters:

ps (str|list): paths or string to infer paths using read_ps.
to_col (dict): convert the paths to a column e.g. {colindex: replaces_index}
func (function): function to be applied on each of the paths.
replaces_outp (dict|function): infer the output path (outp) by replacing substrings in the input paths (p).
filter_rows (dict): filter the rows based on dict, using rd.filter_rows.
fast (bool): parallel processing (default:False).
progress_bar (bool): show progress bar(default:True).
params (dict): parameters provided to the pd.read_csv function.
dbug (bool): debug mode on (default:False).
test1 (bool): test on one path (default:False).
kws_read_table (dict): parameters provided to the read_table function (default:{}).
replaces_index (object|dict|list|str): for example, 'basenamenoext' if path to basename.
drop_index (bool): whether to drop the index column e.g. path (default: True).
colindex (str): the name of the column containing the paths (default: 'path')

Keyword parameters:

kws (dict): parameters provided to the function.

Example:

Function: def apply_(p,outd='data/data_analysed',force=False): outp=f"{outd}/{basenamenoext(p)}.pqt' if exists(outp) and not force: return df01=read_table(p) apply_on_paths( ps=glob("data/data_analysed/*"), func=apply_, outd="data/data_analysed/", force=True, fast=False, read_path=True, )

TODOs: Move out of io.

`function` `read_tables`

read_tables(
    ps: list,
    fast: bool = False,
    filterby_time=None,
    to_dict: bool = False,
    params: dict = {},
    tables: int = None,
    **kws_apply_on_paths: dict
)

Read multiple tables.

Parameters:

ps (list): list of paths.
fast (bool): parallel processing (default:False)
filterby_time (str): filter by time (default:None)
drop_index (bool): drop index (default:True)
to_dict (bool): output dictionary (default:False)
params (dict): parameters provided to the pd.read_csv function (default:{})
tables: number of tables (default:None).

Keyword parameters:

kws_apply_on_paths (dict): parameters provided to apply_on_paths.

Returns:

df (DataFrame): output dataframe.

TODOs: Parameter to report the creation dates of the newest and the oldest files.

`function` `to_table`

to_table(
    df: DataFrame,
    p: str,
    colgroupby: str = None,
    test: bool = False,
    **kws
)

Save table.

Parameters:

df (DataFrame): the input dataframe.
p (str): output path.
colgroupby (str|list): columns to groupby with to save the subsets of the data as separate files.
test (bool): testing on (default:False).

Keyword parameters:

kws (dict): parameters provided to the to_manytables function.

Returns:

p (str): path of the output.

`function` `to_manytables`

to_manytables(
    df: DataFrame,
    p: str,
    colgroupby: str,
    fmt: str = '',
    ignore: bool = False,
    kws_get_chunks={},
    **kws_to_table
)

Save many table.

Parameters:

df (DataFrame): the input dataframe.
p (str): output path.
colgroupby (str|list): columns to groupby with to save the subsets of the data as separate files.
fmt (str): if '=' column names in the folder name e.g. col1=True.
ignore (bool): ignore the warnings (default:False).

Keyword parameters:

kws_get_chunks (dict): parameters provided to the get_chunks function.

Returns:

p (str): path of the output.

TODOs:

1. Change in default parameter: fmt='='.

`function` `to_table_pqt`

to_table_pqt(
    df: DataFrame,
    p: str,
    engine: str = 'fastparquet',
    compression: str = 'gzip',
    **kws_pqt: dict
) → str

Save a parquet file.

Parameters:

df (pd.DataFrame): table.
p (str): path.

Keyword parameters: Parameters provided to pd.DataFrame.to_parquet.

Returns:

`function` `tsv2pqt`

tsv2pqt(p: str) → str

Convert tab-separated file to Apache parquet.

Parameters:

p (str): path of the input.

Returns:

p (str): path of the output.

`function` `pqt2tsv`

pqt2tsv(p: str) → str

Convert Apache parquet file to tab-separated.

Parameters:

p (str): path of the input.

Returns:

p (str): path of the output.

`function` `read_excel`

read_excel(
    p: str,
    sheet_name: str = None,
    kws_cloud: dict = {},
    test: bool = False,
    **kws
)

Read excel file

Parameters:

p (str): path of the file.
sheet_name (str|None): read 1st sheet if None (default:None)
kws_cloud (dict): parameters provided to read the file from the google drive (default:{})
test (bool): if False and sheet_name not provided, return all sheets as a dictionary, else if True, print list of sheets.

Keyword parameters:

kws: parameters provided to the excel reader.

`function` `to_excel_commented`

to_excel_commented(p: str, comments: dict, outp: str = None, author: str = None)

Add comments to the columns of excel file and save.

Args:

p (str): input path of excel file.
comments (dict): map between column names and comment e.g. description of the column.
outp (str): output path of excel file. Defaults to None.
author (str): author of the comments. Defaults to 'Author'.

TODOs: 1. Increase the limit on comments can be added to number of columns. Currently it is 26 i.e. upto Z1.

`function` `to_excel`

to_excel(
    sheetname2df: dict,
    outp: str,
    comments: dict = None,
    save_input: bool = False,
    author: str = None,
    append: bool = False,
    adjust_column_width: bool = True,
    **kws
)

Save excel file.

Parameters:

sheetname2df (dict): dictionary mapping the sheetname to the dataframe.
outp (str): output path.
append (bool): append the dataframes (default:False).
comments (dict): map between column names and comment e.g. description of the column.
save_input (bool): additionally save the input tables in text format.

Keyword parameters:

kws: parameters provided to the excel writer.

`function` `check_chunks`

check_chunks(outd, col, plot=True)

Create chunks of the tables.

Parameters:

outd (str): output directory.
col (str): the column with values that are used for getting the chunks.
plot (bool): plot the chunk sizes (default:True).

Returns:

df3 (DataFrame): output dataframe.

`module` `roux.lib`

Global Variables

df
set
str
sys
dfs
text
io
dict

`function` `to_class`

to_class(cls)

Get the decorator to attach functions.

Parameters:

cls (class): class object.

Returns:

decorator (decorator): decorator object.

References:

https: //gist.github.com/mgarod/09aa9c3d8a52a980bd4d738e52e5b97a

`function` `decorator`

decorator(func)

`class` `rd`

roux-dataframe (.rd) extension.

`method` `init`

__init__(pandas_obj)

`module` `roux.lib.set`

For processing list-like sets.

`function` `union`

union(l)

Union of lists.

Parameters:

l (list): list of lists.

Returns:

l (list): list.

`function` `union`

union(l)

Union of lists.

Parameters:

l (list): list of lists.

Returns:

l (list): list.

`function` `intersection`

intersection(l)

Intersections of lists.

Parameters:

l (list): list of lists.

Returns:

l (list): list.

`function` `intersection`

intersection(l)

Intersections of lists.

Parameters:

l (list): list of lists.

Returns:

l (list): list.

`function` `nunion`

nunion(l)

Count the items in union.

Parameters:

l (list): list of lists.

Returns:

i (int): count.

`function` `nintersection`

nintersection(l)

Count the items in intersetion.

Parameters:

l (list): list of lists.

Returns:

i (int): count.

`function` `check_non_overlaps_with`

check_non_overlaps_with(l1: list, l2: list, out_count: bool = False, log=False)

`function` `validate_overlaps_with`

validate_overlaps_with(l1, l2)

`function` `assert_overlaps_with`

assert_overlaps_with(l1, l2, out_count=False)

`function` `jaccard_index`

jaccard_index(l1, l2)

`function` `dropna`

dropna(x)

Drop np.nan items from a list.

Parameters:

x (list): list.

Returns:

x (list): list.

`function` `unique`

unique(l)

Unique items in a list.

Parameters:

l (list): input list.

Returns:

l (list): list.

Notes:

The function can return list of lists if used in pandas.core.groupby.DataFrameGroupBy.agg context.

`function` `list2str`

list2str(x, ignore=False)

Returns string if single item in a list.

Parameters:

x (list): list

Returns:

s (str): string.

`function` `unique_str`

unique_str(l, **kws)

Unique single item from a list.

Parameters:

l (list): input list.

Returns:

l (list): list.

`function` `nunique`

nunique(l, **kws)

Count unique items in a list

Parameters:

l (list): list

Returns:

i (int): count.

`function` `flatten`

flatten(l)

List of lists to list.

Parameters:

l (list): input list.

Returns:

l (list): output list.

`function` `get_alt`

get_alt(l1, s)

Get alternate item between two.

Parameters:

l1 (list): list.
s (str): item.

Returns:

s (str): alternate item.

`function` `intersections`

intersections(dn2list, jaccard=False, count=True, fast=False, test=False)

Get intersections between lists.

Parameters:

dn2list (dist): dictionary mapping to lists.
jaccard (bool): return jaccard indices.
count (bool): return counts.
fast (bool): fast.
test (bool): verbose.

Returns:

df (DataFrame): output dataframe.

TODOs: 1. feed as an estimator to df.corr(). 2. faster processing by filling up the symetric half of the adjacency matrix.

`function` `range_overlap`

range_overlap(l1, l2)

Overlap between ranges.

Parameters:

l1 (list): start and end integers of one range.
l2 (list): start and end integers of other range.

Returns:

l (list): overlapped range.

`function` `get_windows`

get_windows(
    a,
    size=None,
    overlap=None,
    windows=None,
    overlap_fraction=None,
    stretch_last=False,
    out_ranges=True
)

Windows/segments from a range.

Parameters:

a (list): range.
size (int): size of the windows.
windows (int): number of windows.
overlap_fraction (float): overlap fraction.
overlap (int): overlap length.
stretch_last (bool): stretch last window.
out_ranges (bool): whether to output ranges.

Returns:

df1 (DataFrame): output dataframe.

Notes:

For development, use of int provides np.floor.

`function` `bools2intervals`

bools2intervals(v)

Convert bools to intervals.

Parameters:

v (list): list of bools.

Returns:

l (list): intervals.

`function` `list2ranges`

list2ranges(l)

`function` `get_pairs`

get_pairs(
    items: list,
    items_with: list = None,
    size: int = 2,
    with_self: bool = False
) → DataFrame

Creates a dataframe with the paired items.

Parameters:

items: the list of items to pair. items_with: list of items to pair with. size: size of the combinations. with_self: pair with self or not.

Returns: table with pairs of items.

Notes:

the ids of the items are sorted e.g. 'a'-'b' not 'b'-'a'. 2. itertools.combinations does not pair self.

`module` `roux.lib.str`

For processing strings.

`function` `substitution`

substitution(s, i, replaceby)

Substitute character in a string.

Parameters:

s (string): string.
i (int): location.
replaceby (string): character to substitute with.

Returns:

s (string): output string.

`function` `substitution`

substitution(s, i, replaceby)

Substitute character in a string.

Parameters:

s (string): string.
i (int): location.
replaceby (string): character to substitute with.

Returns:

s (string): output string.

`function` `replace_many`

replace_many(
    s: str,
    replaces: dict,
    replacewith: str = '',
    ignore: bool = False
)

Rename by replacing sub-strings.

Parameters:

s (str): input string.
replaces (dict|list): from->to format or list containing substrings to remove.
replacewith (str): replace to in case replaces is a list.
ignore (bool): if True, not validate the successful replacements.

Returns:

s (DataFrame): output dataframe.

`function` `replace_many`

replace_many(
    s: str,
    replaces: dict,
    replacewith: str = '',
    ignore: bool = False
)

Rename by replacing sub-strings.

Parameters:

s (str): input string.
replaces (dict|list): from->to format or list containing substrings to remove.
replacewith (str): replace to in case replaces is a list.
ignore (bool): if True, not validate the successful replacements.

Returns:

s (DataFrame): output dataframe.

`function` `filter_list`

filter_list(l: list, patterns: list, kind='out') → list

Filter a list of strings.

Args:

l (list): list of strings.
patterns (list): list of regex patterns. patterns are applied after stripping the whitespaces.

Returns: (list) list of filtered strings.

`function` `tuple2str`

tuple2str(tup, sep=' ')

Join tuple items.

Parameters:

tup (tuple|list): input tuple/list.
sep (str): separator between the items.

Returns:

s (str): output string.

`function` `linebreaker`

linebreaker(text, width=None, break_pt=None, sep='\n', **kws)

Insert newlines within a string.

Parameters:

text (str): string.
width (int): insert newline at this interval.
sep (string): separator to split the sub-strings.

Returns:

s (string): output string.

References:

1. textwrap``: https://docs.python.org/3/library/textwrap.html

`function` `findall`

findall(s, ss, outends=False, outstrs=False, suffixlen=0)

Find the substrings or their locations in a string.

Parameters:

s (string): input string.
ss (string): substring.
outends (bool): output end positions.
outstrs (bool): output strings.
suffixlen (int): length of the suffix.

Returns:

l (list): output list.

`function` `get_marked_substrings`

get_marked_substrings(
    s,
    leftmarker='{',
    rightmarker='}',
    leftoff=0,
    rightoff=0
) → list

Get the substrings flanked with markers from a string.

Parameters:

s (str): input string.
leftmarker (str): marker on the left.
rightmarker (str): marker on the right.
leftoff (int): offset on the left.
rightoff (int): offset on the right.

Returns:

l (list): list of substrings.

`function` `get_marked_substrings`

get_marked_substrings(
    s,
    leftmarker='{',
    rightmarker='}',
    leftoff=0,
    rightoff=0
) → list

Get the substrings flanked with markers from a string.

Parameters:

s (str): input string.
leftmarker (str): marker on the left.
rightmarker (str): marker on the right.
leftoff (int): offset on the left.
rightoff (int): offset on the right.

Returns:

l (list): list of substrings.

`function` `mark_substrings`

mark_substrings(s, ss, leftmarker='(', rightmarker=')') → str

Mark sub-string/s in a string.

Parameters:

s (str): input string.
ss (str): substring.
leftmarker (str): marker on the left.
rightmarker (str): marker on the right.

Returns:

s (str): string.

`function` `get_bracket`

get_bracket(s, leftmarker='(', righttmarker=')') → str

Get bracketed substrings.

Parameters:

s (string): string.
leftmarker (str): marker on the left.
rightmarker (str): marker on the right.

Returns:

s (str): string.

TODOs: 1. Use get_marked_substrings.

`function` `align`

align(
    s1: str,
    s2: str,
    prefix: bool = False,
    suffix: bool = False,
    common: bool = True
) → list

Align strings.

Parameters:

s1 (str): string #1.
s2 (str): string #2.
prefix (str): prefix.
suffix (str): suffix.
common (str): common substring.

Returns:

l (list): output list.

Notes:

Code to test: [ get_prefix(source,target,common=False), get_prefix(source,target,common=True), get_suffix(source,target,common=False), get_suffix(source,target,common=True),]

`function` `get_prefix`

get_prefix(s1, s2: str = None, common: bool = True, clean: bool = True) → str

Get the prefix of the strings

Parameters:

s1 (str|list): 1st string.
s2 (str): 2nd string (default:None).
common (bool): get the common prefix (default:True).
clean (bool): clean the leading and trailing whitespaces (default:True).

Returns:

s (str): prefix.

`function` `get_suffix`

get_suffix(s1, s2: str = None, common: bool = True, clean: bool = True) → str

Get the suffix of the strings

Parameters:

s1 (str|list): 1st string.
s2 (str): 2nd string (default:None).
common (bool): get the common prefix (default:True).
clean (bool): clean the leading and trailing whitespaces (default:True).

Returns:

s (str): prefix.

`function` `get_fix`

get_fix(s1: str, s2: str, **kws: dict) → str

Infer common prefix or suffix.

Parameters:

s1 (str): 1st string.
s2 (str): 2nd string.

Keyword parameters:

kws: parameters provided to the get_prefix and get_suffix functions.

Returns:

s (str): prefix or suffix.

`function` `removesuffix`

removesuffix(s1: str, suffix: str) → str

Remove suffix.

Paramters: s1 (str): input string. suffix (str): suffix.

Returns:

s1 (str): string without the suffix.

TODOs: 1. Deprecate in py>39 use .removesuffix() instead.

`function` `str2dict`

str2dict(
    s: str,
    reversible: bool = True,
    sep: str = ';',
    sep_equal: str = '='
) → dict

String to dictionary.

Parameters:

s (str): string.
sep (str): separator between entries (default:';').
sep_equal (str): separator between the keys and the values (default:'=').

Returns:

d (dict): dictionary.

References:

1. https: //stackoverflow.com/a/186873/3521099

`function` `dict2str`

dict2str(
    d1: dict,
    reversible: bool = True,
    sep: str = ';',
    sep_equal: str = '='
) → str

Dictionary to string.

Parameters:

d (dict): dictionary.
sep (str): separator between entries (default:';').
sep_equal (str): separator between the keys and the values (default:'=').
reversible (str): use json

Returns:

s (str): string.

`function` `str2num`

str2num(s: str) → float

String to number.

Parameters:

s (str): string.

Returns:

i (int): number.

`function` `num2str`

num2str(
    num: float,
    magnitude: bool = False,
    coff: float = 10000,
    decimals: int = 0
) → str

Number to string.

Parameters:

num (int): number.
magnitude (bool): use magnitudes (default:False).
coff (int): cutoff (default:10000).
decimals (int): decimal points (default:0).

Returns:

s (str): string.

TODOs 1. ~ if magnitude else not

`function` `encode`

encode(data, short: bool = False, method_short: str = 'sha256', **kws) → str

Encode the data as a string.

Parameters:

data (str|dict|Series): input data.
short (bool): Outputs short string, compatible with paths but non-reversible. Defaults to False.
method_short (str): method used for encoding when short=True.

Keyword parameters:

kws: parameters provided to encoding function.

Returns:

s (string): output string.

`function` `decode`

decode(s, out=None, **kws_out)

Decode data from a string.

Parameters:

s (string): encoded string.
out (str): output format (dict|df).

Keyword parameters:

kws_out: parameters provided to dict2df.

Returns:

d (dict|DataFrame): output data.

`function` `to_formula`

to_formula(
    replaces={' ': 'SPACE', '(': 'LEFTBRACKET', ')': 'RIGHTTBRACKET', '.': 'DOT', ',': 'COMMA', '%': 'PERCENT', "'": 'INVCOMMA', '+': 'PLUS', '-': 'MINUS'},
    reverse=False
) → dict

Converts strings to the formula format, compatible with patsy for example.

`module` `roux.lib.sys`

For processing file paths for example.

`function` `basenamenoext`

basenamenoext(p)

Basename without the extension.

Args:

p (str): path.

Returns:

s (str): output.

`function` `remove_exts`

remove_exts(p: str, exts: tuple = None)

Filename without the extension.

Args:

p (str): path.
exts (tuple): extensions.

Returns:

s (str): output.

`function` `read_ps`

read_ps(ps, test: bool = True, verbose: bool = True) → list

Read a list of paths.

Parameters:

ps (list|str): list of paths or a string with wildcard/s.
test (bool): testing.
verbose (bool): verbose.

Returns:

ps (list): list of paths.

`function` `to_path`

to_path(s, replacewith='_', verbose=False, coff_len_escape_replacement=100)

Normalise a string to be used as a path of file.

Parameters:

s (string): input string.
replacewith (str): replace the whitespaces or incompatible characters with.

Returns:

s (string): output string.

`function` `to_path`

to_path(s, replacewith='_', verbose=False, coff_len_escape_replacement=100)

Normalise a string to be used as a path of file.

Parameters:

s (string): input string.
replacewith (str): replace the whitespaces or incompatible characters with.

Returns:

s (string): output string.

`function` `makedirs`

makedirs(p: str, exist_ok=True, **kws)

Make directories recursively.

Args:

p (str): path.
exist_ok (bool, optional): no error if the directory exists. Defaults to True.

Returns:

p_ (str): the path of the directory.

`function` `to_output_path`

to_output_path(ps, outd=None, outp=None, suffix='')

Infer a single output path for a list of paths.

Parameters:

ps (list): list of paths.
outd (str): path of the output directory.
outp (str): path of the output file.
suffix (str): suffix of the filename.

Returns:

outp (str): path of the output file.

`function` `to_output_paths`

to_output_paths(
    input_paths: list = None,
    inputs: list = None,
    output_path_base: str = None,
    encode_short: bool = True,
    replaces_output_path=None,
    key_output_path: str = None,
    force: bool = False,
    verbose: bool = False
) → dict

Infer a output path for each of the paths or inputs.

Parameters:

input_paths (list) : list of input paths. Defaults to None.
inputs (list) : list of inputs e.g. dictionaries. Defaults to None.
output_path_base (str) : output path with a placeholder '{KEY}' to be replaced. Defaults to None.
encode_short: (bool) : short encoded string, else long encoded string (reversible) is used. Defaults to True.
replaces_output_path : list, dictionary or function to replace the input paths. Defaults to None.
key_output_path (str) : key to be used to incorporate output_path variable among the inputs. Defaults to None.
force (bool): overwrite the outputs. Defaults to False.
verbose (bool) : show verbose. Defaults to False.

Returns: dictionary with the output path mapped to input paths or inputs.

TODOs: 1. Placeholders other than {KEY}.

`function` `get_encoding`

get_encoding(p)

Get encoding of a file.

Parameters:

p (str): file path

Returns:

s (string): encoding.

`function` `get_all_subpaths`

get_all_subpaths(d='.', include_directories=False)

Get all the subpaths.

Args:

d (str, optional): description. Defaults to '.'.
include_directories (bool, optional): to include the directories. Defaults to False.

Returns:

paths (list): sub-paths.

`function` `get_env`

get_env(env_name: str, return_path: bool = False)

Get the virtual environment as a dictionary.

Args:

env_name (str): name of the environment.

Returns:

d (dict): parameters of the virtual environment.

`function` `runbash`

runbash(s1, env=None, test=False, **kws)

Run a bash command.

Args:

s1 (str): command.
env (str): environment name.
test (bool, optional): testing. Defaults to False.

Returns:

output: output of the subprocess.call function.

TODOs: 1. logp 2. error ignoring

`function` `runbash_tmp`

runbash_tmp(
    s1: str,
    env: str,
    df1=None,
    inp='INPUT',
    input_type='df',
    output_type='path',
    tmp_infn='in.txt',
    tmp_outfn='out.txt',
    outp=None,
    force=False,
    test=False,
    **kws
)

Run a bash command in /tmp directory.

Args:

s1 (str): command.
env (str): environment name.
df1 (DataFrame, optional): input dataframe. Defaults to None.
inp (str, optional): input path. Defaults to 'INPUT'.
input_type (str, optional): input type. Defaults to 'df'.
output_type (str, optional): output type. Defaults to 'path'.
tmp_infn (str, optional): temporary input file. Defaults to 'in.txt'.
tmp_outfn (str, optional): temporary output file.. Defaults to 'out.txt'.
outp (type, optional): output path. Defaults to None.
force (bool, optional): force. Defaults to False.
test (bool, optional): test. Defaults to False.

Returns:

output: output of the subprocess.call function.

`function` `create_symlink`

create_symlink(p: str, outp: str, test=False, force=False)

Create symbolic links.

Args:

p (str): input path.
outp (str): output path.
test (bool, optional): test. Defaults to False.

Returns:

outp (str): output path.

TODOs:

Use pathlib``: Path(p).symlink_to(Path(outp))

`function` `input_binary`

input_binary(q: str)

Get input in binary format.

Args:

q (str): question.

Returns:

b (bool): response.

`function` `is_interactive`

is_interactive()

Check if the UI is interactive e.g. jupyter or command line.

`function` `is_interactive_notebook`

is_interactive_notebook()

Check if the UI is interactive e.g. jupyter or command line.

Notes:

Reference:

`function` `get_excecution_location`

get_excecution_location(depth=1)

Get the location of the function being executed.

Args:

depth (int, optional): Depth of the location. Defaults to 1.

Returns:

tuple (tuple): filename and line number.

`function` `get_datetime`

get_datetime(outstr: bool = True, fmt='%G%m%dT%H%M%S')

Get the date and time.

Args:

outstr (bool, optional): string output. Defaults to True.
fmt (str): format of the string.

Returns:

s : date and time.

`function` `p2time`

p2time(filename: str, time_type='m')

Get the creation/modification dates of files.

Args:

filename (str): filename.
time_type (str, optional): description. Defaults to 'm'.

Returns:

time (str): time.

`function` `ps2time`

ps2time(ps: list, **kws_p2time)

Get the times for a list of files.

Args:

ps (list): list of paths.

Returns:

ds (Series): paths mapped to corresponding times.

`function` `get_logger`

get_logger(program='program', argv=None, level=None, dp=None)

Get the logging object.

Args:

program (str, optional): name of the program. Defaults to 'program'.
argv (type, optional): arguments. Defaults to None.
level (type, optional): level of logging. Defaults to None.
dp (type, optional): description. Defaults to None.

`function` `tree`

tree(folder_path: str, log=True)

`module` `roux.lib.text`

For processing text files.

`function` `get_header`

get_header(path: str, comment='#', lineno=None)

Get the header of a file.

Args:

path (str): path.
comment (str): comment identifier.
lineno (int): line numbers upto.

Returns:

lines (list): header.

`function` `cat`

cat(ps, outp)

Concatenate text files.

Args:

ps (list): list of paths.
outp (str): output path.

Returns:

outp (str): output path.

`module` `roux.stat.binary`

For processing binary data.

`function` `compare_bools_jaccard`

compare_bools_jaccard(x, y)

Compare bools in terms of the jaccard index.

Args:

x (list): list of bools.
y (list): list of bools.

Returns:

float: jaccard index.

`function` `compare_bools_jaccard_df`

compare_bools_jaccard_df(df: DataFrame) → DataFrame

Pairwise compare bools in terms of the jaccard index.

Args:

df (DataFrame): dataframe with boolean columns.

Returns:

DataFrame: matrix with comparisons between the columns.

`function` `classify_bools`

classify_bools(l: list) → str

Classify bools.

Args:

l (list): list of bools

Returns:

str: classification.

`function` `frac`

frac(x: list) → float

Fraction.

Args:

x (list): list of bools.

Returns:

float: fraction of True values.

`function` `perc`

perc(x: list) → float

Percentage.

Args:

x (list): list of bools.

Returns:

float: Percentage of the True values

`function` `get_stats_confusion_matrix`

get_stats_confusion_matrix(df_: DataFrame) → DataFrame

Get stats confusion matrix.

Args:

df_ (DataFrame): Confusion matrix.

Returns:

DataFrame: stats.

`function` `get_cutoff`

get_cutoff(
    y_true,
    y_score,
    method,
    show_diagonal=True,
    show_area=True,
    kws_area: dict = {},
    show_cutoff=True,
    plot_pr=True,
    color='k',
    returns=['ax'],
    ax=None
)

Obtain threshold based on ROC or PR curve.

Returns: Table:

columns: values
method: ROC, PR
variable: threshold (index), TPR, FPR, TP counts, precision, recall values: Plots: AUC ROC, TPR vs TP counts PR Specificity vs TP counts Dictionary: Thresholds from AUC, PR

TODOs: 1. Separate the plotting functions.

`module` `roux.stat.cluster`

For clustering data.

`function` `check_clusters`

check_clusters(df: DataFrame)

Check clusters.

Args:

df (DataFrame): dataframe.

`function` `get_clusters`

get_clusters(
    X: <built-in function array>,
    n_clusters: int,
    random_state=88,
    params={},
    test=False
) → dict

Get clusters.

Args:

X (np.array): vector
n_clusters (int): int
random_state (int, optional): random state. Defaults to 88.
params (dict, optional): parameters for the MiniBatchKMeans function. Defaults to {}.
test (bool, optional): test. Defaults to False.

Returns: dict:

`function` `get_n_clusters_optimum`

get_n_clusters_optimum(df5: DataFrame, test=False) → int

Get n clusters optimum.

Args:

df5 (DataFrame): input dataframe
test (bool, optional): test. Defaults to False.

Returns:

int: knee point.

`function` `plot_silhouette`

plot_silhouette(df: DataFrame, n_clusters_optimum=None, ax=None)

Plot silhouette

Args:

df (DataFrame): input dataframe.
n_clusters_optimum (int, optional): number of clusters. Defaults to None:int.
ax (axes, optional): axes object. Defaults to None:axes.

Returns:

ax (axes, optional): axes object. Defaults to None:axes.

`function` `get_clusters_optimum`

get_clusters_optimum(
    X: <built-in function array>,
    n_clusters=range(2, 11),
    params_clustering={},
    test=False
) → dict

Get optimum clusters.

Args:

X (np.array): samples to cluster in indexed format.
n_clusters (int, optional): description. Defaults to range(2,11).
params_clustering (dict, optional): parameters provided to get_clusters. Defaults to {}.
test (bool, optional): test. Defaults to False.

Returns:

dict: description

`function` `get_gmm_params`

get_gmm_params(g, x, n_clusters=2, test=False)

Intersection point of the two peak Gaussian mixture Models (GMMs).

Args:

out (str): coff only or params for all the parameters.

`function` `get_gmm_intersection`

get_gmm_intersection(x, two_pdfs, means, weights, test=False)

`function` `cluster_1d`

cluster_1d(
    ds: Series,
    n_clusters: int,
    clf_type='gmm',
    random_state=1,
    test=False,
    returns=['coff'],
    **kws_clf
) → dict

Cluster 1D data.

Args:

ds (Series): series.
n_clusters (int): number of clusters.
clf_type (str, optional): type of classification. Defaults to 'gmm'.
random_state (int, optional): random state. Defaults to 88.
test (bool, optional): test. Defaults to False.
returns (list, optional): return format. Defaults to ['df','coff','ax','model'].
ax (axes, optional): axes object. Defaults to None.

Raises:

ValueError: clf_type

Returns:

dict: description

`function` `get_pos_umap`

get_pos_umap(df1, spread=100, test=False, k='', **kws) → DataFrame

Get positions of the umap points.

Args:

df1 (DataFrame): input dataframe
spread (int, optional): spead extent. Defaults to 100.
test (bool, optional): test. Defaults to False.
k (str, optional): number of clusters. Defaults to ''.

Returns:

DataFrame: output dataframe.

`module` `roux.stat.compare`

For comparison related stats.

`function` `get_comparison`

get_comparison(
    df1: DataFrame,
    d1: dict = None,
    coff_p: float = 0.05,
    between_ys: bool = False,
    verbose: bool = False,
    **kws
)

Compare the x and y columns.

Parameters:

df1 (pd.DataFrame): input table.
d1 (dict): columns dict, output of get_cols_x_for_comparison.
between_ys (bool): compare y's

Notes:

Column information: d1={'cols_index': ['id'], 'cols_x': {'cont': [], 'desc': []}, 'cols_y': {'cont': [], 'desc': []}} Comparison types: 1. continuous vs continuous -> correlation 2. decrete vs continuous -> difference 3. decrete vs decrete -> FE or chi square

`function` `compare_strings`

compare_strings(l0: list, l1: list, cutoff: float = 0.5) → DataFrame

Compare two lists of strings.

Parameters:

l0 (list): list of strings.
l1 (list): list of strings to compare with.
cutoff (float): threshold to filter the comparisons.

Returns: table with the similarity scores.

TODOs: 1. Add option for semantic similarity.

`module` `roux.stat.corr`

For correlation stats.

`function` `resampled`

resampled(
    x: <built-in function array>,
    y: <built-in function array>,
    method_fun: object,
    method_kws: dict = {},
    ci_type: str = 'max',
    cv: int = 5,
    random_state: int = 1,
    verbose: bool = False
) → tuple

Get correlations after resampling.

Args:

x (np.array): x vector.
y (np.array): y vector.
method_fun (str, optional): method function.
ci_type (str, optional): confidence interval type. Defaults to 'max'.
cv (int, optional): number of resamples. Defaults to 5.
random_state (int, optional): random state. Defaults to 1.
verbose (bool): verbose.

Returns:

dict: results containing mean correlation coefficient, CI and CI type.

`function` `get_corr`

get_corr(
    x: str,
    y: str,
    method: str,
    df: DataFrame = None,
    method_kws: dict = {},
    pval: bool = True,
    preprocess: bool = True,
    n_min=10,
    preprocess_kws: dict = {},
    resample: bool = False,
    cv=5,
    resample_kws: dict = {},
    verbose: bool = False,
    test: bool = False
) → dict

Correlation between vectors. A unifying wrapper around scipy's functions to calculate correlations and distances. Allows application of resampling on those functions.

Usage: 1. Linear table with paired values. For a matrix, use pd.DataFrame.corr instead.

Args:

x (str): x column name or a vector.
y (str): y column name or a vector.
method (str): method name.
df (pd.DataFrame): input table.
pval (bool): calculate p-value.
resample (bool, optional): resampling. Defaults to False.
preprocess (bool): preprocess the input
preprocess_kws (dict) : parameters provided to the pre-processing function i.e. _pre.
resample (bool): resampling.
resample_kws (dict): parameters provided to the resampling function i.e. resample.
verbose (bool): verbose.

Returns:

res (dict): a dictionary containing results.

Notes:

res directory contains following values: method : method name r : correlation coefficient or distance p : pvalue of the correlation. n : sample size rr: resampled average 'r' ci: CI ci_type: CI type

`function` `get_corrs`

get_corrs(
    data: DataFrame,
    method: str,
    cols: list = None,
    cols_with: list = None,
    coff_inflation_min: float = None,
    get_pairs_kws={},
    fast: bool = False,
    test: bool = False,
    verbose: bool = False,
    **kws_get_corr
) → DataFrame

Correlate many columns of a dataframes.

Parameters:

df1 (DataFrame): input dataframe.
method (str): method of correlation spearman or pearson.
cols (str): columns.
cols_with (str): columns to correlate with i.e. variable2.
fast (bool): use parallel-processing if True.

Keyword arguments:

kws_get_corr: parameters provided to get_corr function.

Returns:

DataFrame: output dataframe.

Notes:

In the fast mode (fast=True), to set the number of processes, before executing the get_corrs command, run from pandarallel import pandarallel pandarallel.initialize(nb_workers={},progress_bar=True,use_memory_fs=False)

`function` `check_collinearity`

check_collinearity(
    df1: DataFrame,
    threshold: float = 0.7,
    colvalue: str = 'r',
    cols_variable: list = ['variable1', 'variable2'],
    coff_pval: float = 0.05,
    method: str = 'spearman',
    coff_inflation_min: int = 50
) → Series

Check collinearity.

Args:

df1 (DataFrame): input dataframe.
threshold (float): minimum threshold for the colinearity.

Returns:

DataFrame: output dataframe with minimum correlation among correlated subnetwork of columns.

`function` `pairwise_chi2`

pairwise_chi2(df1: DataFrame, cols_values: list) → DataFrame

Pairwise chi2 test.

Args:

df1 (DataFrame): pd.DataFrame
cols_values (list): list of columns.

Returns:

DataFrame: output dataframe.

TODOs: 0. use lib.set.get_pairs to get the combinations.

`module` `roux.stat.diff`

For difference related stats.

`function` `compare_classes`

compare_classes(x, y, method=None)

Compare classes

`function` `compare_classes_many`

compare_classes_many(df1: DataFrame, cols_y: list, cols_x: list) → DataFrame

`function` `get_pval`

get_pval(
    df: DataFrame,
    colvalue='value',
    colsubset='subset',
    colvalue_bool=False,
    colindex=None,
    subsets=None,
    test=False,
    fun=None
) → tuple

Get p-value.

Args:

df (DataFrame): input dataframe.
colvalue (str, optional): column with values. Defaults to 'value'.
colsubset (str, optional): column with subsets. Defaults to 'subset'.
colvalue_bool (bool, optional): column with boolean values. Defaults to False.
colindex (str, optional): column with the index. Defaults to None.
subsets (list, optional): subset types. Defaults to None.
test (bool, optional): test. Defaults to False.
fun (function, optional): function. Defaults to None.

Raises:

ArgumentError: colvalue or colsubset not found in df.
ValueError: need only 2 subsets.

Returns:

tuple: stat,p-value

`function` `get_stat`

get_stat(
    df1: DataFrame,
    colsubset: str,
    colvalue: str,
    colindex: str,
    subsets=None,
    cols_subsets=['subset1', 'subset2'],
    df2=None,
    stats=[<function mean at 0x7fd396dafb00>, <function median at 0x7fd396c73b00>, <function var at 0x7fd396daff80>, <built-in function len>],
    coff_samples_min=None,
    verb=False,
    **kws
) → DataFrame

Get statistics.

Args:

df1 (DataFrame): input dataframe.
colvalue (str, optional): column with values. Defaults to 'value'.
colsubset (str, optional): column with subsets. Defaults to 'subset'.
colindex (str, optional): column with the index. Defaults to None.
subsets (list, optional): subset types. Defaults to None.
cols_subsets (list, optional): columns with subsets. Defaults to ['subset1', 'subset2'].
df2 (DataFrame, optional): second dataframe. Defaults to None.
stats (list, optional): summary statistics. Defaults to [np.mean,np.median,np.var]+[len].
coff_samples_min (int, optional): minimum sample size required. Defaults to None.
verb (bool, optional): verbose. Defaults to False.

Keyword Arguments:

kws: parameters provided to get_pval function.

Raises:

ArgumentError: colvalue or colsubset not found in df.
ValueError: len(subsets)<2

Returns:

DataFrame: output dataframe.

TODOs: 1. Rename to more specific get_diff, also other get_stat*/get_pval* functions.

`function` `get_stats`

get_stats(
    df1: DataFrame,
    colsubset: str,
    cols_value: list,
    colindex: str,
    subsets=None,
    df2=None,
    cols_subsets=['subset1', 'subset2'],
    stats=[<function mean at 0x7fd396dafb00>, <function median at 0x7fd396c73b00>, <function var at 0x7fd396daff80>, <built-in function len>],
    axis=0,
    test=False,
    **kws
) → DataFrame

Get statistics by iterating over columns wuth values.

Args:

df1 (DataFrame): input dataframe.
colsubset (str, optional): column with subsets.
cols_value (list): list of columns with values.
colindex (str, optional): column with the index.
subsets (list, optional): subset types. Defaults to None.
df2 (DataFrame, optional): second dataframe, e.g. pd.DataFrame({"subset1":['test'],"subset2":['reference']}). Defaults to None.
cols_subsets (list, optional): columns with subsets. Defaults to ['subset1', 'subset2'].
stats (list, optional): summary statistics. Defaults to [np.mean,np.median,np.var]+[len].
axis (int, optional): 1 if different tests else use 0. Defaults to 0.

Keyword Arguments:

kws: parameters provided to get_pval function.

Raises:

ArgumentError: colvalue or colsubset not found in df.
ValueError: len(subsets)<2

Returns:

DataFrame: output dataframe.

TODOs: 1. No column prefix if len(cols_value)==1.

`function` `get_significant_changes`

get_significant_changes(
    df1: DataFrame,
    coff_p=0.025,
    coff_q=0.1,
    alpha=None,
    changeby='mean',
    value_aggs=['mean', 'median']
) → DataFrame

Get significant changes.

Args:

df1 (DataFrame): input dataframe.
coff_p (float, optional): cutoff on p-value. Defaults to 0.025.
coff_q (float, optional): cutoff on q-value. Defaults to 0.1.
alpha (float, optional): alias for coff_p. Defaults to None.
changeby (str, optional): "" if check for change by both mean and median. Defaults to "".
value_aggs (list, optional): values to aggregate. Defaults to ['mean','median'].

Returns:

DataFrame: output dataframe.

`function` `apply_get_significant_changes`

apply_get_significant_changes(
    df1: DataFrame,
    cols_value: list,
    cols_groupby: list,
    cols_grouped: list,
    fast=False,
    **kws
) → DataFrame

Apply on dataframe to get significant changes.

Args:

df1 (DataFrame): input dataframe.
cols_value (list): columns with values.
cols_groupby (list): columns with groups.

Returns:

DataFrame: output dataframe.

`function` `get_stats_groupby`

get_stats_groupby(
    df1: DataFrame,
    cols_group: list,
    coff_p: float = 0.05,
    coff_q: float = 0.1,
    alpha=None,
    fast=False,
    **kws
) → DataFrame

Iterate over groups, to get the differences.

Args:

df1 (DataFrame): input dataframe.
cols_group (list): columns to interate over.
coff_p (float, optional): cutoff on p-value. Defaults to 0.025.
coff_q (float, optional): cutoff on q-value. Defaults to 0.1.
alpha (float, optional): alias for coff_p. Defaults to None.
fast (bool, optional): parallel processing. Defaults to False.

Returns:

DataFrame: output dataframe.

`function` `get_diff`

get_diff(
    df1: DataFrame,
    cols_x: list,
    cols_y: list,
    cols_index: list,
    cols_group: list,
    coff_p: float = None,
    test: bool = False,
    **kws
) → DataFrame

Wrapper around the get_stats_groupby

Keyword parameters: cols=['variable x','variable y'], coff_p=0.05, coff_q=0.01, colindex=['id'],

`function` `binby_pvalue_coffs`

binby_pvalue_coffs(
    df1: DataFrame,
    coffs=[0.01, 0.05, 0.1],
    color=False,
    testn='MWU test, FDR corrected',
    colindex='genes id',
    colgroup='tissue',
    preffix='',
    colns=None,
    palette=None
) → tuple

Bin data by pvalue cutoffs.

Args:

df1 (DataFrame): input dataframe.
coffs (list, optional): cut-offs. Defaults to [0.01,0.05,0.25].
color (bool, optional): color asignment. Defaults to False.
testn (str, optional): test number. Defaults to 'MWU test, FDR corrected'.
colindex (str, optional): column with index. Defaults to 'genes id'.
colgroup (str, optional): column with the groups. Defaults to 'tissue'.
preffix (str, optional): prefix. Defaults to ''.
colns (type, optional): columns number. Defaults to None.
notcountedpalette (type, optional): description. Defaults to None.

Returns:

tuple: output.

Notes:

To be deprecated in the favor of the functions used for enrichment analysis for example.

`module` `roux.stat.io`

For input/output of stats.

`function` `perc_label`

perc_label(a, b=None, bracket=True)

`function` `pval2annot`

pval2annot(
    pval: float,
    alternative: str = None,
    alpha: float = 0.05,
    fmt: str = '*',
    power: bool = True,
    linebreak: bool = False,
    replace_prefix: str = None
)

P/Q-value to annotation.

Parameters:

fmt (str): *|<|'num'

`module` `roux.stat`

Global Variables

binary
io

`module` `roux.stat.network`

For network related stats.

`function` `get_subgraphs`

get_subgraphs(df1: DataFrame, source: str, target: str) → DataFrame

Subgraphs from the the edge list.

Args:

df1 (pd.DataFrame): input dataframe containing edge-list.
source (str): source node.
target (str): taget node.

Returns:

pd.DataFrame: output.

`module` `roux.stat.norm`

For normalisation.

`function` `norm_by_quantile`

norm_by_quantile(X: <built-in function array>) → <built-in function array>

Quantile normalize the columns of X.

Parameters:

X : 2D array of float, shape (M, N). The input data, with M rows (genes/features) and N columns (samples).

Returns:

Xn : 2D array of float, shape (M, N). The normalized data.

Notes:

Faster processing (~5 times compared to other function tested) because of the use of numpy arrays. TODOs: Use from sklearn.preprocessing import QuantileTransformer with output_distribution parameter allowing rescaling back to the same distribution kind.

`function` `norm_by_gaussian_kde`

norm_by_gaussian_kde(
    values: <built-in function array>
) → <built-in function array>

Normalise matrix by gaussian KDE.

Args:

values (np.array): input matrix.

Returns:

np.array: output matrix.

References:

https: //github.com/saezlab/protein_attenuation/blob/6c1e81af37d72ef09835ee287f63b000c7c6663c/src/protein_attenuation/utils.py

`function` `zscore`

zscore(df: DataFrame, cols: list = None) → DataFrame

Z-score.

Args:

df (pd.DataFrame): input table.

Returns:

pd.DataFrame: output table.

TODOs: 1. Use scipy or sklearn's zscore because of it's additional options from scipy.stats import zscore df.apply(zscore)

`function` `zscore_robust`

zscore_robust(a: <built-in function array>) → <built-in function array>

Robust Z-score.

Args:

a (np.array): input data.

Returns:

np.array: output.

Example: t = sc.stats.norm.rvs(size=100, scale=1, random_state=123456) plt.hist(t,bins=40) plt.hist(apply_zscore_robust(t),bins=40) print(np.median(t),np.median(apply_zscore_robust(t)))

`function` `norm_covariance_PCA`

norm_covariance_PCA(
    X: <built-in function array>,
    use_svd: bool = True,
    use_sklearn: bool = True,
    rescale_centered: bool = True,
    random_state: int = 0,
    test: bool = False,
    verbose: bool = False
) → <built-in function array>

Covariance normalization by PCA whitening.

Args:

X (np.array): input array
use_svd (bool, optional): use SVD method. Defaults to True.
use_sklearn (bool, optional): use skelearn for SVD method. Defaults to True.
rescale_centered (bool, optional): rescale to centered input. Defaults to True.
random_state (int, optional): random state. Defaults to 0.
test (bool, optional): test mode. Defaults to False.
verbose (bool, optional): verbose. Defaults to False.

Returns:

np.array: transformed data.

`module` `roux.stat.paired`

For paired stats.

`function` `get_ratio_sorted`

get_ratio_sorted(a: float, b: float, increase=True) → float

Get ratio sorted.

Args:

a (float): value #1.
b (float): value #2.
increase (bool, optional): check for increase. Defaults to True.

Returns:

float: output.

`function` `diff`

diff(a: float, b: float, absolute=True) → float

Get difference

Args:

a (float): value #1.
b (float): value #2.
absolute (bool, optional): get absolute difference. Defaults to True.

Returns:

float: output.

`function` `get_diff_sorted`

get_diff_sorted(a: float, b: float) → float

Difference sorted/absolute.

Args:

a (float): value #1.
b (float): value #2.

Returns:

float: output.

`function` `balance`

balance(a: float, b: float, absolute=True) → float

Balance.

Args:

a (float): value #1.
b (float): value #2.
absolute (bool, optional): absolute difference. Defaults to True.

Returns:

float: output.

`function` `get_paired_sets_stats`

get_paired_sets_stats(l1: list, l2: list, test: bool = False) → list

Paired stats comparing two sets.

Args:

l1 (list): set #1.
l2 (list): set #2.
test (bool): test mode. Defaults to False.

Returns:

list: tuple (overlap, intersection, union, ratio).

`function` `get_stats_paired`

get_stats_paired(
    df1: DataFrame,
    cols: list,
    input_logscale: bool,
    prefix: str = None,
    drop_cols: bool = False,
    unidirectional_stats: list = ['min', 'max'],
    fast: bool = False
) → DataFrame

Paired stats, row-wise.

Args:

df1 (pd.DataFrame): input data.
cols (list): columns.
input_logscale (bool): if the input data is log-scaled.
prefix (str, optional): prefix of the output column/s. Defaults to None.
drop_cols (bool, optional): drop these columns. Defaults to False.
unidirectional_stats (list, optional): column-wise status. Defaults to ['min','max'].
fast (bool, optional): parallel processing. Defaults to False.

Returns:

pd.DataFrame: output dataframe.

`function` `get_stats_paired_agg`

get_stats_paired_agg(
    x: <built-in function array>,
    y: <built-in function array>,
    ignore: bool = False,
    verb: bool = True
) → Series

Paired stats aggregated, for example, to classify 2D distributions.

Args:

x (np.array): x vector.
y (np.array): y vector.
ignore (bool, optional): suppress warnings. Defaults to False.
verb (bool, optional): verbose. Defaults to True.

Returns:

pd.Series: output.

`function` `classify_sharing`

classify_sharing(
    df1: DataFrame,
    column_value: str,
    bins: list = [0, 25, 75, 100],
    labels: list = ['low', 'medium', 'high'],
    prefix: str = '',
    verbose: bool = False
) → DataFrame

Classify sharing % calculated from Jaccard index.

Parameters:

df1 (pd.DataFrame): input table.
column_value (str): column with values.
bins (list): bins. Defaults to [0,25,75,100].
labels (list): bin labels. Defaults to ['low','medium','high'],
prefix (str): prefix of the columns.
verbose (bool): verbose. Defaults to False.

`module` `roux.stat.preprocess`

For classification.

`function` `dropna_matrix`

dropna_matrix(
    df1,
    coff_cols_min_perc_na=5,
    coff_rows_min_perc_na=5,
    test=False,
    verbose=False
)

`function` `drop_low_complexity`

drop_low_complexity(
    df1: DataFrame,
    min_nunique: int,
    max_inflation: int,
    max_nunique: int = None,
    cols: list = None,
    cols_keep: list = [],
    test: bool = False,
    verbose: bool = False
) → DataFrame

Remove low-complexity columns from the data.

Args:

df1 (pd.DataFrame): input data.
min_nunique (int): minimum unique values.
max_inflation (int): maximum over-representation of the values.
cols (list, optional): columns. Defaults to None.
cols_keep (list, optional): columns to keep. Defaults to [].
test (bool, optional): test mode. Defaults to False.

Returns:

pd.DataFrame: output data.

`function` `get_cols_x_for_comparison`

get_cols_x_for_comparison(
    df1: DataFrame,
    cols_y: list,
    cols_index: list,
    cols_drop: list = [],
    cols_dropby_patterns: list = [],
    dropby_low_complexity: bool = True,
    min_nunique: int = 5,
    max_inflation: int = 50,
    dropby_collinearity: bool = True,
    coff_rs: float = 0.7,
    dropby_variance_inflation: bool = True,
    verbose: bool = False,
    test: bool = False
) → dict

Identify X columns.

Parameters:

df1 (pd.DataFrame): input table.
cols_y (list): y columns.

`function` `to_preprocessed_data`

to_preprocessed_data(
    df1: DataFrame,
    columns: dict,
    fill_missing_desc_value: bool = False,
    fill_missing_cont_value: bool = False,
    normby_zscore: bool = False,
    verbose: bool = False,
    test: bool = False
) → DataFrame

Preprocess data.

`function` `to_filteredby_samples`

to_filteredby_samples(
    df1: DataFrame,
    colindex: str,
    colsample: str,
    coff_samples_min: int,
    colsubset: str,
    coff_subsets_min: int = 2
) → DataFrame

Filter table before calculating differences. (1) Retain minimum number of samples per item representing a subset and (2) Retain minimum number of subsets per item.

Parameters: df1 (pd.DataFrame): input table. colindex (str): column containing items. colsample (str): column containing samples. coff_samples_min (int): minimum number of samples. colsubset (str): column containing subsets. coff_subsets_min (int): minimum number of subsets. Defaults to 2.

Returns: pd.DataFrame

Examples:

Parameters: colindex='genes id', colsample='sample id', coff_samples_min=3, colsubset= 'pLOF or WT' coff_subsets_min=2,

`function` `get_cvsplits`

get_cvsplits(
    X: <built-in function array>,
    y: <built-in function array>,
    cv: int = 5,
    random_state: int = None,
    outtest: bool = True
) → dict

Get cross-validation splits. A friendly wrapper around sklearn.model_selection.KFold.

Args:

X (np.array): X matrix.
y (np.array): y vector.
cv (int, optional): cross validations. Defaults to 5.
random_state (int, optional): random state. Defaults to None.
outtest (bool, optional): output test data. Defaults to True.

Returns:

dict: output.

`module` `roux.stat.sets`

For set related stats.

`function` `get_overlap`

get_overlap(
    items_set: list,
    items_test: list,
    output_format: str = 'list'
) → list

Get overlapping items as a string.

Args:

items_set (list): items in the reference set
items_test (list): items to test
output_format (str, optional): format of the output. Defaults to 'list'.

Raises:

ValueError: output_format can be list or str

`function` `get_overlap_size`

get_overlap_size(
    items_set: list,
    items_test: list,
    fraction: bool = False,
    perc: bool = False,
    by: str = None
) → float

Percentage Jaccard index.

Args:

items_set (list): items in the reference set
items_test (list): items to test
fraction (bool, optional): output fraction. Defaults to False.
perc (bool, optional): output percentage. Defaults to False.
by (str, optional): fraction by. Defaults to None.

Returns:

float: overlap size.

`function` `get_item_set_size_by_background`

get_item_set_size_by_background(items_set: list, background: int) → float

Item set size by background

Args:

items_set (list): items in the reference set
background (int): background size

Returns:

float: Item set size by background

Notes:

Denominator of the fold change.

`function` `get_fold_change`

get_fold_change(items_set: list, items_test: list, background: int) → float

Get fold change.

Args:

items_set (list): items in the reference set
items_test (list): items to test
background (int): background size

Returns:

float: fold change

Notes:

fc = (intersection/(test items))/((items in the item set)/background)

`function` `get_hypergeom_pval`

get_hypergeom_pval(items_set: list, items_test: list, background: int) → float

Calculate hypergeometric P-value.

Args:

items_set (list): items in the reference set
items_test (list): items to test
background (int): background size

Returns:

float: hypergeometric P-value

`function` `get_contigency_table`

get_contigency_table(items_set: list, items_test: list, background: int) → list

Get a contingency table required for the Fisher's test.

Args:

items_set (list): items in the reference set
items_test (list): items to test
background (int): background size

Returns:

list: contingency table

Notes:

within item (/referenece) set: True False within test item: True intersection True False False False False total-size of union

`function` `get_odds_ratio`

get_odds_ratio(items_set: list, items_test: list, background: int) → float

Calculate Odds ratio and P-values using Fisher's exact test.

Args:

items_set (list): items in the reference set
items_test (list): items to test
background (int): background size

Returns:

float: Odds ratio

`function` `get_enrichment`

get_enrichment(
    df1: DataFrame,
    df2: DataFrame,
    colid: str,
    colset: str,
    background: int,
    coltest: str = None,
    verbose: bool = False
) → DataFrame

Calculate the enrichments.

Args:

df1 (pd.DataFrame): table containing items to test
df2 (pd.DataFrame): table containing refence sets and items
colid (str): column with IDs of items
colset (str): column sets
coltest (str): column tests
background (int): background size.
verbose (bool): verbose

Returns:

pd.DataFrame: output table

`module` `roux.stat.solve`

For solving equations.

`function` `get_intersection_locations`

get_intersection_locations(
    y1: <built-in function array>,
    y2: <built-in function array>,
    test: bool = False,
    x: <built-in function array> = None
) → list

Get co-ordinates of the intersection (x[idx]).

Args:

y1 (np.array): vector.
y2 (np.array): vector.
test (bool, optional): test mode. Defaults to False.
x (np.array, optional): vector. Defaults to None.

Returns:

list: output.

`module` `roux.stat.transform`

For transformations.

`function` `plog`

plog(x, p: float, base: int)

Psudo-log.

Args:

x (float|np.array): input.
p (float): pseudo-count.
base (int): base of the log.

Returns: output.

`function` `anti_plog`

anti_plog(x, p: float, base: int)

Anti-psudo-log.

Args:

x (float|np.array): input.
p (float): pseudo-count.
base (int): base of the log.

Returns: output.

`function` `log_pval`

log_pval(
    x,
    errors: str = 'raise',
    replace_zero_with: float = None,
    p_min: float = None
)

Transform p-values to Log10.

Paramters: x: input. errors (str): Defaults to 'raise' else replace (in case of visualization only). p_min (float): Replace zeros with this value. Note: to be used for visualization only.

Returns: output.

`function` `get_q`

get_q(ds1: Series, col: str = None, verb: bool = True, test_coff: float = 0.1)

To FDR corrected P-value.

`function` `glog`

glog(x: float, l=2)

Generalised logarithm.

Args:

x (float): input.
l (int, optional): psudo-count. Defaults to 2.

Returns:

float: output.

`function` `rescale`

rescale(
    a: <built-in function array>,
    range1: tuple = None,
    range2: tuple = [0, 1]
) → <built-in function array>

Rescale within a new range.

Args:

a (np.array): input vector.
range1 (tuple, optional): existing range. Defaults to None.
range2 (tuple, optional): new range. Defaults to [0,1].

Returns:

np.array: output.

`function` `rescale_divergent`

rescale_divergent(df1: DataFrame, col: str) → DataFrame

Rescale divergently i.e. two-sided.

Args:

df1 (pd.DataFrame): description
col (str): column.

Returns:

pd.DataFrame: column.

Notes:

Under development.

`module` `roux.stat.variance`

For variance related stats.

`function` `confidence_interval_95`

confidence_interval_95(x: <built-in function array>) → float

95% confidence interval.

Args:

x (np.array): input vector.

Returns:

float: output.

`function` `get_ci`

get_ci(rs, ci_type, outstr=False)

`function` `get_variance_inflation`

get_variance_inflation(data, coly: str, cols_x: list = None)

Variance Inflation Factor (VIF). A wrapper around statsmodels's 'variance_inflation_factor function.

Parameters:

data (pd.DataFrame): input data.
coly (str): dependent variable.
cols_x (list): independent variables.

Returns: pd.Series

`module` `roux.viz.annot`

For annotations.

`function` `annot_side`

annot_side(
    ax: Axes,
    df1: DataFrame,
    colx: str,
    coly: str,
    cols: str = None,
    hue: str = None,
    loc: str = 'right',
    scatter=False,
    scatter_marker='|',
    scatter_alpha=0.75,
    lines=True,
    offx3: float = 0.15,
    offymin: float = 0.1,
    offymax: float = 0.9,
    length_axhline: float = 3,
    text=True,
    text_offx: float = 0,
    text_offy: float = 0,
    invert_xaxis: bool = False,
    break_pt: int = 25,
    va: str = 'bottom',
    zorder: int = 2,
    color: str = 'gray',
    kws_line: dict = {},
    kws_scatter: dict = {},
    **kws_text
) → Axes

Annot elements of the plots on the of the side plot.

Args:

df1 (pd.DataFrame): input data
colx (str): column with x values.
coly (str): column with y values.
cols (str): column with labels.
hue (str): column with colors of the labels.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
loc (str, optional): location. Defaults to 'right'.
invert_xaxis (bool, optional): invert xaxis. Defaults to False.
offx3 (float, optional): x-offset for bend position of the arrow. Defaults to 0.15.
offymin (float, optional): x-offset minimum. Defaults to 0.1.
offymax (float, optional): x-offset maximum. Defaults to 0.9.
break_pt (int, optional): break point of the labels. Defaults to 25.
length_axhline (float, optional): length of the horizontal line i.e. the "underline". Defaults to 3.
zorder (int, optional): z-order. Defaults to 1.
color (str, optional): color of the line. Defaults to 'gray'.
kws_line (dict, optional): parameters for formatting the line. Defaults to {}.

Keyword Args:

kws: parameters provided to the ax.text function.

Returns:

plt.Axes: plt.Axes object.

`function` `show_outlines`

show_outlines(
    data: DataFrame,
    colx: str,
    coly: str,
    column_outlines: str,
    outline_colors: dict,
    style=None,
    legend: bool = True,
    kws_legend: dict = {},
    zorder: int = 3,
    ax: Axes = None,
    **kws_scatter
) → Axes

Outline points on the scatter plot by categories.

`function` `show_confidence_ellipse`

show_confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs)

Create a plot of the covariance confidence ellipse of x and y.

Parameters:

---------- x, y : array-like, shape (n, ) Input data.

ax : matplotlib.axes.Axes The axes object to draw the ellipse into.

n_std : float The number of standard deviations to determine the ellipse's radiuses.

**kwargs Forwarded to ~matplotlib.patches.Ellipse

Returns ------- matplotlib.patches.Ellipse

References ---------- https://matplotlib.org/3.5.0/gallery/statistics/confidence_ellipse.html

`function` `show_box`

show_box(
    ax: Axes,
    xy: tuple,
    width: float,
    height: float,
    fill: str = None,
    alpha: float = 1,
    lw: float = 1.1,
    edgecolor: str = 'k',
    clip_on: bool = False,
    scale_width: float = 1,
    scale_height: float = 1,
    xoff: float = 0,
    yoff: float = 0,
    **kws
) → Axes

Highlight sections of a plot e.g. heatmap by drawing boxes.

Args:

xy (tuple): position of left, bottom corner of the box.
width (float): width.
height (float): height.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
fill (str, optional): fill the box with color. Defaults to None.
alpha (float, optional): alpha of color. Defaults to 1.
lw (float, optional): line width. Defaults to 1.1.
edgecolor (str, optional): edge color. Defaults to 'k'.
clip_on (bool, optional): clip the boxes by the axis limit. Defaults to False.
scale_width (float, optional): scale width. Defaults to 1.
scale_height (float, optional): scale height. Defaults to 1.
xoff (float, optional): x-offset. Defaults to 0.
yoff (float, optional): y-offset. Defaults to 0.

Keyword Args:

kws: parameters provided to the Rectangle function.

Returns:

plt.Axes: plt.Axes object.

`function` `color_ax`

color_ax(ax: Axes, c: str, linewidth: float = None) → Axes

Color border of plt.Axes.

Args:

ax (plt.Axes): plt.Axes object.
c (str): color.
linewidth (float, optional): line width. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

`function` `show_n_legend`

show_n_legend(ax, df1: DataFrame, colid: str, colgroup: str, **kws)

`function` `show_scatter_stats`

show_scatter_stats(
    ax: Axes,
    data: DataFrame,
    x,
    y,
    z,
    method: str,
    resample: bool = False,
    show_n: bool = True,
    show_n_prefix: str = '',
    prefix: str = '',
    loc=None,
    zorder: int = 5,
    verbose: bool = True,
    **kws_set_label
)

resample (bool, optional): resample data. Defaults to False.

`function` `show_crosstab_stats`

show_crosstab_stats(
    data: DataFrame,
    cols: list,
    method: str = None,
    alpha: float = 0.05,
    loc: str = None,
    xoff: float = 0,
    yoff: float = 0,
    linebreak: bool = False,
    ax: Axes = None,
    **kws_set_label
) → Axes

Annotate a confusion matrix.

Args:

data (pd.DataFrame): input data.
cols (list): list of columns with the categories.
method (str, optional): method used to calculate the statistical significance.
alpha (float, optional): alpha for the stats. Defaults to 0.05.
loc (str, optional): location. Over-rides kws_set_label. Defaults to None.
xoff (float, optional): x offset. Defaults to 0.
yoff (float, optional): y offset. Defaults to 0.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws_set_label: keyword parameters provided to set_label.

Returns:

plt.Axes: plt.Axes object.

`function` `show_confusion_matrix_stats`

show_confusion_matrix_stats(
    df_: DataFrame,
    ax: Axes = None,
    off: float = 0.5
) → Axes

Annotate a confusion matrix.

Args:

df_ (pd.DataFrame): input data.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
off (float, optional): offset. Defaults to 0.5.

Returns:

plt.Axes: plt.Axes object.

`function` `get_logo_ax`

get_logo_ax(
    ax: Axes,
    size: float = 0.5,
    bbox_to_anchor: list = None,
    loc: str = 1,
    axes_kwargs: dict = {'zorder': -1}
) → Axes

Get plt.Axes for placing the logo.

Args:

ax (plt.Axes): plt.Axes object.
size (float, optional): size of the subplot. Defaults to 0.5.
bbox_to_anchor (list, optional): location. Defaults to None.
loc (str, optional): location. Defaults to 1.
axes_kwargs (type, optional): parameters provided to inset_axes. Defaults to {'zorder':-1}.

Returns:

plt.Axes: plt.Axes object.

`function` `set_logo`

set_logo(
    imp: str,
    ax: Axes,
    size: float = 0.5,
    bbox_to_anchor: list = None,
    loc: str = 1,
    axes_kwargs: dict = {'zorder': -1},
    params_imshow: dict = {'aspect': 'auto', 'alpha': 1, 'interpolation': 'catrom'},
    test: bool = False,
    force: bool = False
) → Axes

Set logo.

Args:

imp (str): path to the logo file.
ax (plt.Axes): plt.Axes object.
size (float, optional): size of the subplot. Defaults to 0.5.
bbox_to_anchor (list, optional): location. Defaults to None.
loc (str, optional): location. Defaults to 1.
axes_kwargs (type, optional): parameters provided to inset_axes. Defaults to {'zorder':-1}.
params_imshow (type, optional): parameters provided to the imshow function. Defaults to {'aspect':'auto','alpha':1, 'interpolation':'catrom'}.
test (bool, optional): test mode. Defaults to False.
force (bool, optional): overwrite file. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`function` `set_suptitle`

set_suptitle(axs, title, offy=0, **kws_text)

Combined title for a list of subplots.

`module` `roux.viz.ax_`

For setting up subplots.

`function` `set_axes_minimal`

set_axes_minimal(ax, xlabel=None, ylabel=None, off_axes_pad=0) → Axes

Set minimal axes labels, at the lower left corner.

`function` `set_label`

set_label(
    s: str,
    ax: Axes,
    x: float = 0,
    y: float = 0,
    ha: str = 'left',
    va: str = 'top',
    loc=None,
    off_loc=0.01,
    title: bool = False,
    **kws
) → Axes

Set label on a plot.

Args:

x (float): x position.
y (float): y position.
s (str): label.
ax (plt.Axes): plt.Axes object.
ha (str, optional): horizontal alignment. Defaults to 'left'.
va (str, optional): vertical alignment. Defaults to 'top'.
loc (int, optional): location of the label. 1:'upper right', 2:'upper left', 3:'lower left':3, 4:'lower right'
offs_loc (tuple,optional): x and y location offsets.
title (bool, optional): set as title. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`function` `set_ylabel`

set_ylabel(
    ax: Axes,
    s: str = None,
    x: float = -0.1,
    y: float = 1.02,
    xoff: float = 0,
    yoff: float = 0
) → Axes

Set ylabel horizontal.

Args:

ax (plt.Axes): plt.Axes object.
s (str, optional): ylabel. Defaults to None.
x (float, optional): x position. Defaults to -0.1.
y (float, optional): y position. Defaults to 1.02.
xoff (float, optional): x offset. Defaults to 0.
yoff (float, optional): y offset. Defaults to 0.

Returns:

plt.Axes: plt.Axes object.

`function` `get_ax_labels`

get_ax_labels(ax: Axes)

`function` `format_labels`

format_labels(ax, fmt='cap1', title_fontsize=15, rename_labels=None, test=False)

`function` `rename_ticklabels`

rename_ticklabels(
    ax: Axes,
    axis: str,
    rename: dict = None,
    replace: dict = None,
    ignore: bool = False
) → Axes

Rename the ticklabels.

Args:

ax (plt.Axes, optional): plt.Axes object. Defaults to None.
axis (str): axis (x|y).
rename (dict, optional): replace strings. Defaults to None.
replace (dict, optional): replace sub-strings. Defaults to None.
ignore (bool, optional): ignore warnings. Defaults to False.

Raises:

ValueError: either rename or replace should be provided.

Returns:

plt.Axes: plt.Axes object.

`function` `get_ticklabel_position`

get_ticklabel_position(ax: Axes, axis: str) → Axes

Get positions of the ticklabels.

Args:

ax (plt.Axes): plt.Axes object.
axis (str): axis (x|y).

Returns:

plt.Axes: plt.Axes object.

`function` `set_ticklabels_color`

set_ticklabels_color(ax: Axes, ticklabel2color: dict, axis: str = 'y') → Axes

Set colors to ticklabels.

Args:

ax (plt.Axes): plt.Axes object.
ticklabel2color (dict): colors of the ticklabels.
axis (str): axis (x|y).

Returns:

plt.Axes: plt.Axes object.

`function` `format_ticklabels`

format_ticklabels(
    ax: Axes,
    axes: tuple = ['x', 'y'],
    interval: float = None,
    n: int = None,
    fmt: str = None,
    font: str = None
) → Axes

format_ticklabels

Args:

ax (plt.Axes): plt.Axes object.
axes (tuple, optional): axes. Defaults to ['x','y'].
n (int, optional): number of ticks. Defaults to None.
fmt (str, optional): format e.g. '.0f'. Defaults to None.
font (str, optional): font. Defaults to 'DejaVu Sans Mono'.

Returns:

plt.Axes: plt.Axes object.

TODOs: 1. include color_ticklabels

`function` `split_ticklabels`

split_ticklabels(
    ax: Axes,
    fmt: str,
    axis='x',
    group_x=-0.45,
    group_y=-0.25,
    group_prefix=None,
    group_suffix=False,
    group_loc='center',
    group_colors=None,
    group_alpha=0.2,
    show_group_line=True,
    group_line_off_x=0.15,
    group_line_off_y=0.1,
    show_group_span=False,
    group_span_kws={},
    sep: str = '-',
    pad_major=6,
    off: float = 0.2,
    test: bool = False,
    **kws
) → Axes

Split ticklabels into major and minor. Two minor ticks are created per major tick.

Args:

ax (plt.Axes): plt.Axes object.
fmt (str): 'group'-wise or 'pair'-wise splitting of the ticklabels.
axis (str): name of the axis: x or y.
sep (str, optional): separator within the tick labels. Defaults to ' '.
test (bool, optional): test-mode. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`function` `get_axlimsby_data`

get_axlimsby_data(
    X: Series,
    Y: Series,
    off: float = 0.2,
    equal: bool = False
) → Axes

Infer axis limits from data.

Args:

X (pd.Series): x values.
Y (pd.Series): y values.
off (float, optional): offsets. Defaults to 0.2.
equal (bool, optional): equal limits. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`function` `get_axlims`

get_axlims(ax: Axes) → Axes

Get axis limits.

Args:

ax (plt.Axes): plt.Axes object.

Returns:

plt.Axes: plt.Axes object.

`function` `set_equallim`

set_equallim(
    ax: Axes,
    diagonal: bool = False,
    difference: float = None,
    format_ticks: bool = True,
    **kws_format_ticklabels
) → Axes

Set equal axis limits.

Args:

ax (plt.Axes): plt.Axes object.
diagonal (bool, optional): show diagonal. Defaults to False.
difference (float, optional): difference from . Defaults to None.

Returns:

plt.Axes: plt.Axes object.

`function` `set_axlims`

set_axlims(
    ax: Axes,
    off: float,
    axes: list = ['x', 'y'],
    equal=False,
    **kws_set_equallim
) → Axes

Set axis limits.

Args:

ax (plt.Axes): plt.Axes object.
off (float): offset.
axes (list, optional): axis name/s. Defaults to ['x','y'].

Returns:

plt.Axes: plt.Axes object.

`function` `set_grids`

set_grids(ax: Axes, axis: str = None) → Axes

Show grids based on the shape (aspect ratio) of the plot.

Args:

ax (plt.Axes): plt.Axes object.
axis (str, optional): axis name. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

`function` `rename_legends`

rename_legends(ax: Axes, replaces: dict, **kws_legend) → Axes

Rename legends.

Args:

ax (plt.Axes): plt.Axes object.
replaces (dict): description

Returns:

plt.Axes: plt.Axes object.

`function` `append_legends`

append_legends(ax: Axes, labels: list, handles: list, **kws) → Axes

Append to legends.

Args:

ax (plt.Axes): plt.Axes object.
labels (list): labels.
handles (list): handles.

Returns:

plt.Axes: plt.Axes object.

`function` `sort_legends`

sort_legends(ax: Axes, sort_order: list = None, **kws) → Axes

Sort or filter legends.

Args:

ax (plt.Axes): plt.Axes object.
sort_order (list, optional): order of legends. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

Notes:

Filter the legends by providing the indices of the legends to keep.

`function` `drop_duplicate_legend`

drop_duplicate_legend(ax, **kws)

`function` `reset_legend_colors`

reset_legend_colors(ax)

Reset legend colors.

Args:

ax (plt.Axes): plt.Axes object.

Returns:

plt.Axes: plt.Axes object.

`function` `set_legends_merged`

set_legends_merged(axs)

Reset legend colors.

Args:

axs (list): list of plt.Axes objects.

Returns:

plt.Axes: first plt.Axes object in the list.

`function` `set_legend_custom`

set_legend_custom(
    ax: Axes,
    legend2param: dict,
    param: str = 'color',
    lw: float = 1,
    marker: str = 'o',
    markerfacecolor: bool = True,
    size: float = 10,
    color: str = 'k',
    linestyle: str = '',
    title_ha: str = 'center',
    frameon: bool = True,
    **kws
) → Axes

Set custom legends.

Args:

ax (plt.Axes): plt.Axes object.
legend2param (dict): legend name to parameter to change e.g. name of the color.
param (str, optional): parameter to change. Defaults to 'color'.
lw (float, optional): line width. Defaults to 1.
marker (str, optional): marker type. Defaults to 'o'.
markerfacecolor (bool, optional): marker face color. Defaults to True.
size (float, optional): size of the markers. Defaults to 10.
color (str, optional): color of the markers. Defaults to 'k'.
linestyle (str, optional): line style. Defaults to ''.
title_ha (str, optional): title horizontal alignment. Defaults to 'center'.
frameon (bool, optional): show frame. Defaults to True.

Returns:

plt.Axes: plt.Axes object.

TODOs: 1. differnet number of points for eachh entry

from matplotlib.legend_handler import HandlerTuple l1, = plt.plot(-1, -1, lw=0, marker="o", markerfacecolor='k', markeredgecolor='k') l2, = plt.plot(-0.5, -1, lw=0, marker="o", markerfacecolor="none", markeredgecolor='k') plt.legend([(l1,), (l1, l2)], ["test 1", "test 2"],

handler_map={tuple: HandlerTuple(2)} )

References:

https: //matplotlib.org/stable/api/markers_api.html
http: //www.cis.jhu.edu/~shanest/mpt/js/mathjax/mathjax-dev/fonts/Tables/STIX/STIX/All/All.html

`function` `get_line_cap_length`

get_line_cap_length(ax: Axes, linewidth: float) → Axes

Get the line cap length.

Args:

ax (plt.Axes): plt.Axes object
linewidth (float): width of the line.

Returns:

plt.Axes: plt.Axes object

`function` `set_colorbar`

set_colorbar(
    fig: object,
    ax: Axes,
    ax_pc: Axes,
    label: str,
    bbox_to_anchor: tuple = (0.05, 0.5, 1, 0.45),
    orientation: str = 'vertical'
)

Set colorbar.

Args:

fig (object): figure object.
ax (plt.Axes): plt.Axes object.
ax_pc (plt.Axes): plt.Axes object for the colorbar.
label (str): label
bbox_to_anchor (tuple, optional): location. Defaults to (0.05, 0.5, 1, 0.45).
orientation (str, optional): orientation. Defaults to "vertical".

Returns: figure object.

`function` `set_colorbar_label`

set_colorbar_label(ax: Axes, label: str) → Axes

Find colorbar and set label for it.

Args:

ax (plt.Axes): plt.Axes object.
label (str): label.

Returns:

plt.Axes: plt.Axes object.

`module` `roux.viz.bar`

For bar plots.

`function` `plot_barh`

plot_barh(
    df1: DataFrame,
    colx: str,
    coly: str,
    colannnotside: str = None,
    x1: float = None,
    offx: float = 0,
    ax: Axes = None,
    **kws
) → Axes

Plot horizontal bar plot with text on them.

Args:

df1 (pd.DataFrame): input data.
colx (str): x column.
coly (str): y column.
colannnotside (str): column with annotations to show on the right side of the plot.
x1 (float): x position of the text.
offx (float): x-offset of x1, multiplier.
color (str): color of the bars.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the barh function.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_value_counts`

plot_value_counts(
    df: DataFrame,
    col: str,
    logx: bool = False,
    kws_hist: dict = {'bins': 10},
    kws_bar: dict = {},
    grid: bool = False,
    axes: list = None,
    fig: object = None,
    hist: bool = True
)

Plot pandas's value_counts.

Args:

df (pd.DataFrame): input data value_counts.
col (str): column with counts.
logx (bool, optional): x-axis on log-scale. Defaults to False.
kws_hist (type, optional): parameters provided to the hist function. Defaults to {'bins':10}.
kws_bar (dict, optional): parameters provided to the bar function. Defaults to {}.
grid (bool, optional): show grids or not. Defaults to False.
axes (list, optional): list of plt.axes. Defaults to None.
fig (object, optional): figure object. Defaults to None.
hist (bool, optional): show histgram. Defaults to True.

`function` `plot_barh_stacked_percentage`

plot_barh_stacked_percentage(
    df1: DataFrame,
    coly: str,
    colannot: str,
    color: str = None,
    yoff: float = 0,
    ax: Axes = None
) → Axes

Plot horizontal stacked bar plot with percentages.

Args:

df1 (pd.DataFrame): input data. values in rows sum to 100%.
coly (str): y column. yticklabels, e.g. retained and dropped.
colannot (str): column with annotations.
color (str, optional): color. Defaults to None.
yoff (float, optional): y-offset. Defaults to 0.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_bar_serial`

plot_bar_serial(
    d1: dict,
    polygon: bool = False,
    polygon_x2i: float = 0,
    labelis: list = [],
    y: float = 0,
    ylabel: str = None,
    off_arrowy: float = 0.15,
    kws_rectangle={'height': 0.5, 'linewidth': 1},
    ax: Axes = None
) → Axes

Barplots with serial increase in resolution.

Args:

d1 (dict): dictionary with the data.
polygon (bool, optional): show polygon. Defaults to False.
polygon_x2i (float, optional): connect polygon to this subset. Defaults to 0.
labelis (list, optional): label these subsets. Defaults to [].
y (float, optional): y position. Defaults to 0.
ylabel (str, optional): y label. Defaults to None.
off_arrowy (float, optional): offset for the arrow. Defaults to 0.15.
kws_rectangle (type, optional): parameters provided to the rectangle function. Defaults to dict(height=0.5,linewidth=1).
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_barh_stacked_percentage_intersections`

plot_barh_stacked_percentage_intersections(
    df0: DataFrame,
    colxbool: str,
    colybool: str,
    colvalue: str,
    colid: str,
    colalt: str,
    colgroupby: str,
    coffgroup: float = 0.95,
    ax: Axes = None
) → Axes

Plot horizontal stacked bar plot with percentages and intesections.

Args:

df0 (pd.DataFrame): input data.
colxbool (str): x column.
colybool (str): y column.
colvalue (str): column with the values.
colid (str): column with ids.
colalt (str): column with the alternative subset.
colgroupby (str): column with groups.
coffgroup (float, optional): cut-off between the groups. Defaults to 0.95.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

Examples:

Parameters: colxbool='paralog', colybool='essential', colvalue='value', colid='gene id', colalt='singleton', coffgroup=0.95, colgroupby='tissue',

`function` `to_input_data_sankey`

to_input_data_sankey(
    df0,
    colid,
    cols_groupby=None,
    colall='all',
    remove_all=False
)

`function` `plot_sankey`

plot_sankey(
    df1,
    cols_groupby=None,
    hues=None,
    node_color=None,
    link_color=None,
    info=None,
    x=None,
    y=None,
    colors=None,
    hovertemplate=None,
    text_width=20,
    convert=True,
    width=400,
    height=400,
    outp=None,
    validate=True,
    test=False,
    **kws
)

`module` `roux.viz.colors`

For setting up colors.

`function` `rgbfloat2int`

rgbfloat2int(rgb_float)

`function` `get_colors_default`

get_colors_default() → list

get default colors.

Returns:

list: colors.

`function` `get_ncolors`

get_ncolors(
    n: int,
    cmap: str = 'Spectral',
    ceil: bool = False,
    test: bool = False,
    N: int = 20,
    out: str = 'hex',
    **kws_get_cmap_section
) → list

Get colors.

Args:

n (int): number of colors to get.
cmap (str, optional): colormap. Defaults to 'Spectral'.
ceil (bool, optional): ceil. Defaults to False.
test (bool, optional): test mode. Defaults to False.
N (int, optional): number of colors in the colormap. Defaults to 20.
out (str, optional): output. Defaults to 'hex'.

Returns:

list: colors.

`function` `get_val2color`

get_val2color(
    ds: Series,
    vmin: float = None,
    vmax: float = None,
    cmap: str = 'Reds'
) → dict

Get color for a value.

Args:

ds (pd.Series): values.
vmin (float, optional): minimum value. Defaults to None.
vmax (float, optional): maximum value. Defaults to None.
cmap (str, optional): colormap. Defaults to 'Reds'.

Returns:

dict: output.

`function` `saturate_color`

saturate_color(color, alpha: float) → object

Saturate a color.

Args: color (type):

alpha (float): alpha level.

Returns:

object: output.

References:

https: //stackoverflow.com/a/60562502/3521099

`function` `mix_colors`

mix_colors(d: dict) → str

Mix colors.

Args:

d (dict): colors to alpha map.

Returns:

str: hex color.

References:

https: //stackoverflow.com/a/61488997/3521099

`function` `make_cmap`

make_cmap(cs: list, N: int = 20, **kws)

Create a colormap.

Args:

cs (list): colors
N (int, optional): resolution i.e. number of colors. Defaults to 20.

Returns: cmap.

`function` `get_cmap_section`

get_cmap_section(
    cmap,
    vmin: float = 0.0,
    vmax: float = 1.0,
    n: int = 100
) → object

Get section of a colormap.

Args:

cmap (object| str): colormap.
vmin (float, optional): minimum value. Defaults to 0.0.
vmax (float, optional): maximum value. Defaults to 1.0.
n (int, optional): resolution i.e. number of colors. Defaults to 100.

Returns:

object: cmap.

`function` `append_cmap`

append_cmap(
    cmap: str = 'Reds',
    color: str = '#D3DDDC',
    cmap_min: float = 0.2,
    cmap_max: float = 0.8,
    ncolors: int = 100,
    ncolors_min: int = 1,
    ncolors_max: int = 0
)

Append a color to colormap.

Args:

cmap (str, optional): colormap. Defaults to 'Reds'.
color (str, optional): color. Defaults to '#D3DDDC'.
cmap_min (float, optional): cmap_min. Defaults to 0.2.
cmap_max (float, optional): cmap_max. Defaults to 0.8.
ncolors (int, optional): number of colors. Defaults to 100.
ncolors_min (int, optional): number of colors minimum. Defaults to 1.
ncolors_max (int, optional): number of colors maximum. Defaults to 0.

Returns: cmap.

References:

https: //matplotlib.org/stable/tutorials/colors/colormap-manipulation.html

`module` `roux.viz.compare`

For comparative plots.

`function` `plot_comparisons`

plot_comparisons(
    plot_data,
    x,
    ax=None,
    output_dir_path=None,
    force=False,
    return_path=False
)

Parameters:

plot_data: output of .stat.compare.get_comparison

Notes:

sample type: different sample of the same data.

`module` `roux.viz.diagram`

For diagrams e.g. flowcharts

`function` `diagram_nb`

diagram_nb(graph: str, out: bool = False)

Show a diagram in jupyter notebook using mermaid.js.

Parameters:

graph (str): markdown-formatted graph. Please see https://mermaid.js.org/intro/n00b-syntaxReference.html
out (bool): Output the URL. Defaults to False.

References:

1. https: //mermaid.js.org/config/Tutorials.html#jupyter-integration-with-mermaid-js

Examples:

graph LR; i1(["input1"]) & d1[("data1")] --> p1[["process1"]]
--> o1(["output1"]) p1
--> o2["output2"]:::ends classDef ends fill:#fff,stroke:#fff

`module` `roux.viz.dist`

For distribution plots.

`function` `hist_annot`

hist_annot(
    dplot: DataFrame,
    colx: str,
    colssubsets: list = [],
    bins: int = 100,
    subset_unclassified: bool = True,
    cmap: str = 'hsv',
    ymin=None,
    ymax=None,
    ylimoff: float = 1,
    ywithinoff: float = 1.2,
    annotaslegend: bool = True,
    annotn: bool = True,
    params_scatter: dict = {'zorder': 2, 'alpha': 0.1, 'marker': '|'},
    xlim: tuple = None,
    ax: Axes = None,
    **kws
) → Axes

Annoted histogram.

Args:

dplot (pd.DataFrame): input dataframe.
colx (str): x column.
colssubsets (list, optional): columns indicating subsets. Defaults to [].
bins (int, optional): bins. Defaults to 100.
subset_unclassified (bool, optional): call non-annotated subset as 'unclassified'. Defaults to True.
cmap (str, optional): colormap. Defaults to 'Reds_r'.
ylimoff (float, optional): y-offset for y-axis limit . Defaults to 1.2.
ywithinoff (float, optional): y-offset for the distance within labels. Defaults to 1.2.
annotaslegend (bool, optional): convert labels to legends. Defaults to True.
annotn (bool, optional): annotate sample sizes. Defaults to True.
params_scatter (type, optional): parameters of the scatter plot. Defaults to {'zorder':2,'alpha':0.1,'marker':'|'}.
xlim (tuple, optional): x-axis limits. Defaults to None.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the hist function.

Returns:

plt.Axes: plt.Axes object.

TODOs: For scatter, use annot_side with loc='top'.

`function` `plot_gmm`

plot_gmm(
    x: Series,
    coff: float = None,
    mix_pdf: object = None,
    two_pdfs: tuple = None,
    weights: tuple = None,
    n_clusters: int = 2,
    bins: int = 20,
    show_cutoff: bool = True,
    show_cutoff_line: bool = True,
    colors: list = ['gray', 'gray', 'lightgray'],
    out_coff: bool = False,
    hist: bool = True,
    test: bool = False,
    ax: Axes = None,
    kws_axvline={'color': 'k'},
    **kws
) → Axes

Plot Gaussian mixture Models (GMMs).

Args:

x (pd.Series): input vector.
coff (float, optional): intersection between two fitted distributions. Defaults to None.
mix_pdf (object, optional): Probability density function of the mixed distribution. Defaults to None.
two_pdfs (tuple, optional): Probability density functions of the separate distributions. Defaults to None.
weights (tuple, optional): weights of the individual distributions. Defaults to None.
n_clusters (int, optional): number of distributions. Defaults to 2.
bins (int, optional): bins. Defaults to 50.
colors (list, optional): colors of the invividual distributions and of the mixed one. Defaults to ['gray','gray','lightgray']. 'gray'
out_coff (bool,False): return the cutoff. Defaults to False.
hist (bool, optional): show histogram. Defaults to True.
test (bool, optional): test mode. Defaults to False.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the hist function.
kws_axvline: parameters provided to the axvline function.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_normal`

plot_normal(x: Series, ax: Axes = None) → Axes

Plot normal distribution.

Args:

x (pd.Series): input vector.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Returns:

plt.Axes: plt.Axes object.

`function` `get_jitter_positions`

get_jitter_positions(ax, df1, order, column_category, column_position)

`function` `plot_dists`

plot_dists(
    df1: DataFrame,
    x: str,
    y: str,
    colindex: str,
    hue: str = None,
    order: list = None,
    hue_order: list = None,
    kind: str = 'box',
    show_p: bool = True,
    show_n: bool = True,
    show_n_prefix: str = '',
    show_n_ha=None,
    show_n_ticklabels: bool = True,
    show_outlines: bool = False,
    kws_outlines: dict = {},
    alternative: str = 'two-sided',
    offx_n: float = 0,
    axis_cont_lim: tuple = None,
    axis_cont_scale: str = 'linear',
    offs_pval: dict = None,
    alpha: float = 0.5,
    ax: Axes = None,
    test: bool = False,
    kws_stats: dict = {},
    **kws
) → Axes

Plot distributions.

Args:

df1 (pd.DataFrame): input data.
x (str): x column.
y (str): y column.
colindex (str): index column.
hue (str, optional): column with values to be encoded as hues. Defaults to None.
order (list, optional): order of categorical values. Defaults to None.
hue_order (list, optional): order of values to be encoded as hues. Defaults to None.
kind (str, optional): kind of distribution. Defaults to 'box'.
show_p (bool, optional): show p-values. Defaults to True.
show_n (bool, optional): show sample sizes. Defaults to True.
show_n_prefix (str, optional): show prefix of sample size label i.e. n=. Defaults to ''.
offx_n (float, optional): x-offset for the sample size label. Defaults to 0.
axis_cont_lim (tuple, optional): x-axis limits. Defaults to None.
offs_pval (float, optional): x and y offsets for the p-value labels.
# saturate_color_alpha (float, optional): saturation of the color. Defaults to 1.5.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
test (bool, optional): test mode. Defaults to False.
kws_stats (dict, optional): parameters provided to the stat function. Defaults to {}.

Keyword Args:

kws: parameters provided to the seaborn function.

Returns:

plt.Axes: plt.Axes object.

TODOs: 1. Sort categories. 2. Change alpha of the boxplot rather than changing saturation of the swarmplot.

`function` `pointplot_groupbyedgecolor`

pointplot_groupbyedgecolor(data: DataFrame, ax: Axes = None, **kws) → Axes

Plot seaborn's pointplot grouped by edgecolor of points.

Args:

data (pd.DataFrame): input data.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the seaborn's pointplot function.

Returns:

plt.Axes: plt.Axes object.

`module` `roux.viz.figure`

For setting up figures.

`function` `get_children`

get_children(fig)

Get all the individual objects included in the figure.

`function` `get_child_text`

get_child_text(search_name, all_children=None, fig=None)

Get text object.

`function` `align_texts`

align_texts(fig, texts: list, align: str, test=False)

Align text objects.

`function` `labelplots`

labelplots(
    axes: list = None,
    fig=None,
    labels: list = None,
    xoff: float = 0,
    yoff: float = 0,
    auto: bool = False,
    xoffs: dict = {},
    yoffs: dict = {},
    va: str = 'center',
    ha: str = 'left',
    verbose: bool = True,
    test: bool = False,
    **kws_text
)

Label (sub)plots.

Args:

fig : plt.figure object.
axes (type): list of plt.Axes objects.
xoff (int, optional): x offset. Defaults to 0.
yoff (int, optional): y offset. Defaults to 0.
params_alignment (dict, optional): alignment parameters. Defaults to {}.
params_text (dict, optional): parameters provided to plt.text. Defaults to {'size':20,'va':'bottom', 'ha':'right' }.
test (bool, optional): test mode. Defaults to False.

Todos: 1. Get the x coordinate of the ylabel.

`module` `roux.viz.heatmap`

For heatmaps.

`function` `plot_table`

plot_table(
    df1: DataFrame,
    xlabel: str = None,
    ylabel: str = None,
    annot: bool = True,
    cbar: bool = False,
    linecolor: str = 'k',
    linewidths: float = 1,
    cmap: str = None,
    sorty: bool = False,
    linebreaky: bool = False,
    scales: tuple = [1, 1],
    ax: Axes = None,
    **kws
) → Axes

Plot to show a table.

Args:

df1 (pd.DataFrame): input data.
xlabel (str, optional): x label. Defaults to None.
ylabel (str, optional): y label. Defaults to None.
annot (bool, optional): show numbers. Defaults to True.
cbar (bool, optional): show colorbar. Defaults to False.
linecolor (str, optional): line color. Defaults to 'k'.
linewidths (float, optional): line widths. Defaults to 1.
cmap (str, optional): color map. Defaults to None.
sorty (bool, optional): sort rows. Defaults to False.
linebreaky (bool, optional): linebreak for y labels. Defaults to False.
scales (tuple, optional): scale of the table. Defaults to [1,1].
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the sns.heatmap function.

Returns:

plt.Axes: plt.Axes object.

`module` `roux.viz.image`

For visualization of images.

`function` `plot_image`

plot_image(
    imp: str,
    ax: Axes = None,
    force=False,
    margin=0,
    axes=False,
    test=False,
    **kwarg
) → Axes

Plot image e.g. schematic.

Args:

imp (str): path of the image.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
force (bool, optional): overwrite output. Defaults to False.
margin (int, optional): margins. Defaults to 0.
test (bool, optional): test mode. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

:param kwarg: cairosvg: {'dpi':500,'scale':2}; imagemagick: {'trim':False,'alpha':False}

`module` `roux.viz.io`

For input/output of plots.

`function` `to_plotp`

to_plotp(
    ax: Axes = None,
    prefix: str = 'plot/plot_',
    suffix: str = '',
    fmts: list = ['png']
) → str

Infer output path for a plot.

Args:

ax (plt.Axes): plt.Axes object.
prefix (str, optional): prefix with directory path for the plot. Defaults to 'plot/plot_'.
suffix (str, optional): suffix of the filename. Defaults to ''.
fmts (list, optional): formats of the images. Defaults to ['png'].

Returns:

str: output path for the plot.

`function` `savefig`

savefig(
    plotp: str,
    tight_layout: bool = True,
    bbox_inches: list = None,
    fmts: list = ['png'],
    savepdf: bool = False,
    normalise_path: bool = True,
    replaces_plotp: dict = None,
    dpi: int = 500,
    force: bool = True,
    kws_replace_many: dict = {},
    kws_savefig: dict = {},
    **kws
) → str

Wrapper around plt.savefig.

Args:

plotp (str): output path or plt.Axes object.
tight_layout (bool, optional): tight_layout. Defaults to True.
bbox_inches (list, optional): bbox_inches. Defaults to None.
savepdf (bool, optional): savepdf. Defaults to False.
normalise_path (bool, optional): normalise_path. Defaults to True.
replaces_plotp (dict, optional): replaces_plotp. Defaults to None.
dpi (int, optional): dpi. Defaults to 500.
force (bool, optional): overwrite output. Defaults to True.
kws_replace_many (dict, optional): parameters provided to the replace_many function. Defaults to {}.

Keyword Args:

kws: parameters provided to to_plotp function.
kws_savefig: parameters provided to to_savefig function.
kws_replace_many: parameters provided to replace_many function.

Returns:

str: output path.

`function` `savelegend`

savelegend(
    plotp: str,
    legend: object,
    expand: list = [-5, -5, 5, 5],
    **kws_savefig
) → str

Save only the legend of the plot/figure.

Args:

plotp (str): output path.
legend (object): legend object.
expand (list, optional): expand. Defaults to [-5,-5,5,5].

Returns:

str: output path.

References:

1. https: //stackoverflow.com/a/47749903/3521099

`function` `update_kws_plot`

update_kws_plot(kws_plot: dict, kws_plotp: dict, test: bool = False) → dict

Update the input parameters.

Args:

kws_plot (dict): input parameters.
kws_plotp (dict): saved parameters.
test (bool, optional): description. Defaults to False.

Returns:

dict: updated parameters.

`function` `get_plot_inputs`

get_plot_inputs(
    plotp: str,
    df1: DataFrame = None,
    kws_plot: dict = {},
    outd: str = None
) → tuple

Get plot inputs.

Args:

plotp (str): path of the plot.
df1 (pd.DataFrame): data for the plot.
kws_plot (dict): parameters of the plot.
outd (str): output directory.

Returns:

tuple: (path,dataframe,dict)

`function` `log_code`

log_code()

Log the code.

`function` `log_code`

log_code()

Log the code.

`function` `get_lines`

get_lines(
    logp: str = 'log_notebook.log',
    sep: str = 'begin_plot()',
    test: bool = False
) → list

Get lines from the log.

Args:

logp (str, optional): path to the log file. Defaults to 'log_notebook.log'.
sep (str, optional): label marking the start of code of the plot. Defaults to 'begin_plot()'.
test (bool, optional): test mode. Defaults to False.

Returns:

list: lines of code.

`function` `to_script`

to_script(
    srcp: str,
    plotp: str,
    defn: str = 'plot_',
    s4: str = '    ',
    test: bool = False,
    **kws
) → str

Save the script with the code for the plot.

Args:

srcp (str): path of the script.
plotp (str): path of the plot.
defn (str, optional): prefix of the function. Defaults to "plot_".
s4 (str, optional): a tab. Defaults to ' '.
test (bool, optional): test mode. Defaults to False.

Returns:

str: path of the script.

TODOs: 1. Compatible with names of the input dataframes other that df1. 1. Get the variable name of the dataframe

def get_df_name(df): name =[x for x in globals() if globals()[x] is df and not x.startswith('-')][0] return name

Replace df1 with the variable name of the dataframe.

`function` `to_plot`

to_plot(
    plotp: str,
    data: DataFrame = None,
    df1: DataFrame = None,
    kws_plot: dict = {},
    logp: str = 'log_notebook.log',
    sep: str = 'begin_plot()',
    validate: bool = False,
    show_path: bool = False,
    show_path_offy: float = -0.2,
    force: bool = True,
    test: bool = False,
    quiet: bool = True,
    **kws
) → str

Save a plot.

Args:

plotp (str): output path.
df1 (pd.DataFrame, optional): dataframe with plotting data. Defaults to None.
data (pd.DataFrame, optional): dataframe with plotting data. Defaults to None.
kws_plot (dict, optional): parameters for plotting. Defaults to dict().
logp (str, optional): path to the log. Defaults to 'log_notebook.log'.
sep (str, optional): separator marking the start of the plotting code in jupyter notebook. Defaults to 'begin_plot()'.
validate (bool, optional): validate the "readability" using read_plot function. Defaults to False.
show_path (bool, optional): show path on the plot. Defaults to False.
show_path_offy (float, optional): y-offset for the path label. Defaults to 0.
force (bool, optional): overwrite output. Defaults to True.
test (bool, optional): test mode. Defaults to False.
quiet (bool, optional): quiet mode. Defaults to False.

Returns:

str: output path.

Notes:

Requirement: 1. Start logging in the jupyter notebook. from IPython import get_ipython log_notebookp=f'log_notebook.log';open(log_notebookp, 'w').close();get_ipython().run_line_magic('logstart','{log_notebookp} over')

`function` `read_plot`

read_plot(p: str, safe: bool = False, test: bool = False, **kws) → Axes

Generate the plot from data, parameters and a script.

Args:

p (str): path of the plot saved using to_plot function.
safe (bool, optional): read as an image. Defaults to False.
test (bool, optional): test mode. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`function` `to_concat`

to_concat(
    ps: list,
    how: str = 'h',
    use_imagemagick: bool = False,
    use_conda_env: bool = False,
    test: bool = False,
    **kws_outp
) → str

Concat images.

Args:

ps (list): list of paths.
how (str, optional): horizontal (h) or vertical v. Defaults to 'h'.
test (bool, optional): test mode. Defaults to False.

Returns:

str: path of the output.

`function` `to_montage`

to_montage(
    ps: list,
    layout: str,
    source_path: str = None,
    env_name: str = None,
    hspace: float = 0,
    vspace: float = 0,
    output_path: str = None,
    test: bool = False,
    **kws_outp
) → str

To montage.

Args:

ps (type): list of paths.
layout (type): layout of the images.
hspace (int, optional): horizontal space. Defaults to 0.
vspace (int, optional): vertical space. Defaults to 0.
test (bool, optional): test mode. Defaults to False.

Returns:

str: path of the output.

`function` `to_gif`

to_gif(
    ps: list,
    outp: str,
    duration: int = 200,
    loop: int = 0,
    optimize: bool = True
) → str

Convert to GIF.

Args:

ps (list): list of paths.
outp (str): output path.
duration (int, optional): duration. Defaults to 200.
loop (int, optional): loop or not. Defaults to 0.
optimize (bool, optional): optimize the size. Defaults to True.

Returns:

str: output path.

References:

1. https: //pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#gif
2. https: //stackoverflow.com/a/57751793/3521099

`function` `to_data`

to_data(path: str) → str

Convert to base64 string.

Args:

path (str): path of the input.

Returns: base64 string.

`function` `to_convert`

to_convert(filep: str, outd: str = None, fmt: str = 'JPEG') → str

Convert format of image using PIL.

Args:

filep (str): input path.
outd (str, optional): output directory. Defaults to None.
fmt (str, optional): format of the output. Defaults to "JPEG".

Returns:

str: output path.

`function` `to_raster`

to_raster(
    plotp: str,
    dpi: int = 500,
    alpha: bool = False,
    trim: bool = False,
    force: bool = False,
    test: bool = False
) → str

to_raster summary

Args:

plotp (str): input path.
dpi (int, optional): DPI. Defaults to 500.
alpha (bool, optional): transparency. Defaults to False.
trim (bool, optional): trim margins. Defaults to False.
force (bool, optional): overwrite output. Defaults to False.
test (bool, optional): test mode. Defaults to False.

Returns:

str: description

Notes:

Runs a bash command: convert -density 300 -trim.

`function` `to_rasters`

to_rasters(plotd, ext='svg')

Convert many images to raster. Uses inkscape.

Args:

plotd (str): directory.
ext (str, optional): extension of the output. Defaults to 'svg'.

`module` `roux.viz.line`

For line plots.

`function` `plot_range`

plot_range(
    df00: DataFrame,
    colvalue: str,
    colindex: str,
    k: str,
    headsize: int = 15,
    headcolor: str = 'lightgray',
    ax: Axes = None,
    **kws_area
) → Axes

Plot range/intervals e.g. genome coordinates as lines.

Args:

df00 (pd.DataFrame): input data.
colvalue (str): column with values.
colindex (str): column with ids.
k (str): subset name.
headsize (int, optional): margin at top. Defaults to 15.
headcolor (str, optional): color of the margin. Defaults to 'lightgray'.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword args:

kws: keyword parameters provided to area function.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_connections`

plot_connections(
    dplot: DataFrame,
    label2xy: dict,
    colval: str = '$r_{s}$',
    line_scale: int = 40,
    legend_title: str = 'similarity',
    label2rename: dict = None,
    element2color: dict = None,
    xoff: float = 0,
    yoff: float = 0,
    rectangle: dict = {'width': 0.2, 'height': 0.32},
    params_text: dict = {'ha': 'center', 'va': 'center'},
    params_legend: dict = {'bbox_to_anchor': (1.1, 0.5), 'ncol': 1, 'frameon': False},
    legend_elements: list = [],
    params_line: dict = {'alpha': 1},
    ax: Axes = None,
    test: bool = False
) → Axes

Plot connections between points with annotations.

Args:

dplot (pd.DataFrame): input data.
label2xy (dict): label to position.
colval (str, optional): column with values. Defaults to '{s}$'.
line_scale (int, optional): line_scale. Defaults to 40.
legend_title (str, optional): legend_title. Defaults to 'similarity'.
label2rename (dict, optional): label2rename. Defaults to None.
element2color (dict, optional): element2color. Defaults to None.
xoff (float, optional): xoff. Defaults to 0.
yoff (float, optional): yoff. Defaults to 0.
rectangle (type, optional): rectangle. Defaults to {'width':0.2,'height':0.32}.
params_text (type, optional): params_text. Defaults to {'ha':'center','va':'center'}.
params_legend (type, optional): params_legend. Defaults to {'bbox_to_anchor':(1.1, 0.5), 'ncol':1, 'frameon':False}.
legend_elements (list, optional): legend_elements. Defaults to [].
params_line (type, optional): params_line. Defaults to {'alpha':1}.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
test (bool, optional): test mode. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_kinetics`

plot_kinetics(
    df1: DataFrame,
    x: str,
    y: str,
    hue: str,
    cmap: str = 'Reds_r',
    ax: Axes = None,
    test: bool = False,
    kws_legend: dict = {},
    **kws_set
) → Axes

Plot time-dependent kinetic data.

Args:

df1 (pd.DataFrame): input data.
x (str): x column.
y (str): y column.
hue (str): hue column.
cmap (str, optional): colormap. Defaults to 'Reds_r'.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
test (bool, optional): test mode. Defaults to False.
kws_legend (dict, optional): legend parameters. Defaults to {}.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_steps`

plot_steps(
    df1: DataFrame,
    col_step_name: str,
    col_step_size: str,
    ax: Axes = None,
    test: bool = False
) → Axes

Plot step-wise changes in numbers, e.g. for a filtering process.

Args:

df1 (pd.DataFrame): input data.
col_step_size (str): column containing the numbers.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
test (bool, optional): test mode. Defaults to False.

Returns:

plt.Axes: plt.Axes object.

`module` `roux.viz`

Global Variables

io
colors
diagram

`module` `roux.viz.scatter`

For scatter plots.

`function` `plot_scatter_agg`

plot_scatter_agg(
    dplot: DataFrame,
    x: str = None,
    y: str = None,
    z: str = None,
    kws_legend={'bbox_to_anchor': [1, 1], 'loc': 'upper left'}
)

UNDER DEV.

`function` `plot_scatter`

plot_scatter(
    data: DataFrame,
    x: str = None,
    y: str = None,
    z: str = None,
    kind: str = 'scatter',
    scatter_kws={},
    line_kws={},
    stat_method: str = 'spearman',
    stat_kws={},
    hollow: bool = False,
    ax: Axes = None,
    verbose: bool = True,
    **kws
) → Axes

Plot scatter with multiple layers and stats.

Args:

data (pd.DataFrame): input dataframe.
x (str): x column.
y (str): y column.
z (str, optional): z column. Defaults to None.
kind (str, optional): kind of scatter. Defaults to 'hexbin'.
trendline_method (str, optional): trendline method ['poly','lowess']. Defaults to 'poly'.
stat_method (str, optional): method of annoted stats ['mlr',"spearman"]. Defaults to "spearman".
cmap (str, optional): colormap. Defaults to 'Reds'.
label_colorbar (str, optional): label of the colorbar. Defaults to None.
gridsize (int, optional): number of grids in the hexbin. Defaults to 25.
bbox_to_anchor (list, optional): location of the legend. Defaults to [1,1].
loc (str, optional): location of the legend. Defaults to 'upper left'.
title (str, optional): title of the plot. Defaults to None.
#params_plot (dict, optional): parameters provided to the plot function. Defaults to {}.
line_kws (dict, optional): parameters provided to the plot_trendline function. Defaults to {}.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the plot function.

Returns:

plt.Axes: plt.Axes object.

Notes:

For a rasterized scatter plot set scatter_kws={'rasterized': True} 2. This function does not apply multiple colors, similar to sns.regplot.

`function` `plot_qq`

plot_qq(x: Series) → Axes

plot QQ.

Args:

x (pd.Series): input vector.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_ranks`

plot_ranks(
    df1: DataFrame,
    colid: str,
    colx: str,
    coly: str = 'rank',
    ascending: bool = True,
    ax=None,
    **kws
) → Axes

Plot rankings.

Args:

dplot (pd.DataFrame): input data.
colx (str): x column.
coly (str): y column.
colid (str): column with unique ids.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Keyword Args:

kws: parameters provided to the seaborn.scatterplot function.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_volcano`

plot_volcano(
    data: DataFrame,
    colx: str,
    coly: str,
    colindex: str,
    hue: str = 'x',
    style: str = 'P=0',
    style_order: list = ['o', '^'],
    markers: list = ['o', '^'],
    show_labels: int = None,
    show_outlines: int = None,
    outline_colors: list = ['k'],
    collabel: str = None,
    show_line=True,
    line_pvalue=0.1,
    line_x: float = 0.0,
    line_x_min: float = None,
    show_text: bool = True,
    text_increase: str = None,
    text_decrease: str = None,
    text_diff: str = None,
    legend: bool = False,
    verbose: bool = False,
    p_min: float = None,
    ax: Axes = None,
    outmore: bool = False,
    kws_legend: dict = {},
    **kws_scatterplot
) → Axes

Volcano plot.

Parameters:

Keyword parameters:

Returns: plt.Axes

`module` `roux.viz.sets`

For plotting sets.

`function` `plot_venn`

plot_venn(
    ds1: Series,
    ax: Axes = None,
    figsize: tuple = [2.5, 2.5],
    show_n: bool = True,
    outmore=False,
    **kws
) → Axes

Plot Venn diagram.

Args:

ds1 (pd.Series): input pandas.Series or dictionary. Subsets in the index levels, mapped to counts.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.
figsize (tuple, optional): figure size. Defaults to [2.5,2.5].
show_n (bool, optional): show sample sizes. Defaults to True.

Returns:

plt.Axes: plt.Axes object.

`function` `plot_intersection_counts`

plot_intersection_counts(
    df1: DataFrame,
    cols: list = None,
    kind: str = 'table',
    method: str = None,
    show_pval: bool = True,
    confusion: bool = False,
    rename_cols: bool = False,
    sort_cols: tuple = [True, True],
    order_x: list = None,
    order_y: list = None,
    cmap: str = 'Reds',
    ax: Axes = None,
    kws_show_stats: dict = {},
    **kws_plot
) → Axes

Plot counts for the intersection between two sets.

Args:

df1 (pd.DataFrame): input data
cols (list, optional): columns. Defaults to None.
kind (str, optional): kind of plot: table or barplot. Detaults to table.
method (str, optional): method to check the association ['chi2','FE']. Defaults to None.
rename_cols (bool, optional): rename the columns. Defaults to True.
show_pval (bool, optional): annotate p-values. Defaults to True.
cmap (str, optional): colormap. Defaults to 'Reds'.
kws_show_stats (dict, optional): arguments provided to stats function. Defaults to {}.
ax (plt.Axes, optional): plt.Axes object. Defaults to None.

Raises:

ValueError: show_pval position should be the allowed one.

Keyword Args:

kws_plot: keyword arguments provided to the plotting function.

Returns:

plt.Axes: plt.Axes object.

TODOs: 1. Use compare_classes to get the stats.

`function` `plot_intersections`

plot_intersections(
    ds1: Series,
    item_name: str = None,
    figsize: tuple = [4, 4],
    text_width: float = 2,
    yorder: list = None,
    sort_by: str = 'cardinality',
    sort_categories_by: str = None,
    element_size: int = 40,
    facecolor: str = 'gray',
    bari_annot: int = None,
    totals_bar: bool = False,
    totals_text: bool = True,
    intersections_ylabel: float = None,
    intersections_min: float = None,
    test: bool = False,
    annot_text: bool = False,
    set_ylabelx: float = -0.25,
    set_ylabely: float = 0.5,
    **kws
) → Axes

Plot upset plot.

Args:

ds1 (pd.Series): input vector.
item_name (str, optional): name of items. Defaults to None.
figsize (tuple, optional): figure size. Defaults to [4,4].
text_width (float, optional): max. width of the text. Defaults to 2.
yorder (list, optional): order of y elements. Defaults to None.
sort_by (str, optional): sorting method. Defaults to 'cardinality'.
sort_categories_by (str, optional): sorting method. Defaults to None.
element_size (int, optional): size of elements. Defaults to 40.
facecolor (str, optional): facecolor. Defaults to 'gray'.
bari_annot (int, optional): annotate nth bar. Defaults to None.
totals_text (bool, optional): show totals. Defaults to True.
intersections_ylabel (float, optional): y-label of the intersections. Defaults to None.
intersections_min (float, optional): intersection minimum to show. Defaults to None.
test (bool, optional): test mode. Defaults to False.
annot_text (bool, optional): annotate text. Defaults to False.
set_ylabelx (float, optional): x position of the ylabel. Defaults to -0.25.
set_ylabely (float, optional): y position of the ylabel. Defaults to 0.5.

Keyword Args:

kws: parameters provided to the upset.plot function.

Returns:

plt.Axes: plt.Axes object.

Notes:

sort_by:{‘cardinality’, ‘degree’} If ‘cardinality’, subset are listed from largest to smallest. If ‘degree’, they are listed in order of the number of categories intersected. sort_categories_by:{‘cardinality’, None} Whether to sort the categories by total cardinality, or leave them in the provided order. References: https://upsetplot.readthedocs.io/en/stable/api.html

`function` `plot_enrichment`

plot_enrichment(
    data: DataFrame,
    x: str,
    y: str,
    s: str,
    hue='Q',
    xlabel=None,
    ylabel='significance\n(-log10(Q))',
    size: int = None,
    color: str = None,
    annots_side: int = 5,
    annots_side_labels=None,
    coff_fdr: float = None,
    xlim: tuple = None,
    xlim_off: float = 0.2,
    ylim: tuple = None,
    ax: Axes = None,
    break_pt: int = 25,
    annot_coff_fdr: bool = False,
    kws_annot: dict = {'loc': 'right', 'offx3': 0.15},
    returns='ax',
    **kwargs
) → Axes

Plot enrichment stats.

Args:

 - <b>`data`</b> (pd.DataFrame):  input data. 
 - <b>`x`</b> (str):  x column. 
 - <b>`y`</b> (str):  y column. 
 - <b>`s`</b> (str):  size column. 
 - <b>`size`</b> (int, optional):  size of the points. Defaults to None. 
 - <b>`color`</b> (str, optional):  color of the points. Defaults to None. 
 - <b>`annots_side`</b> (int, optional):  how many labels to show on side. Defaults to 5. 
 - <b>`coff_fdr`</b> (float, optional):  FDR cutoff. Defaults to None. 
 - <b>`xlim`</b> (tuple, optional):  x-axis limits. Defaults to None. 
 - <b>`xlim_off`</b> (float, optional):  x-offset on limits. Defaults to 0.2. 
 - <b>`ylim`</b> (tuple, optional):  y-axis limits. Defaults to None. 
 - <b>`ax`</b> (plt.Axes, optional):  `plt.Axes` object. Defaults to None. 
 - <b>`break_pt`</b> (int, optional):  break point (' ') for the labels. Defaults to 25. 
 - <b>`annot_coff_fdr`</b> (bool, optional):  show FDR cutoff. Defaults to False. 
 - <b>`kws_annot`</b> (dict, optional):  parameters provided to the `annot_side` function. Defaults to dict( loc='right', annot_count_max=5, offx3=0.15, ).

Keyword Args: - kwargs: parameters provided to the sns.scatterplot function.

Returns:

 - <b>`plt.Axes`</b>:  `plt.Axes` object.

`function` `plot_pie`

plot_pie(
    counts: list,
    labels: list,
    scales_line_xy: tuple = (1.1, 1.1),
    remove_wedges: list = None,
    remove_wedges_index: list = [],
    line_color: str = 'k',
    annot_side: bool = False,
    kws_annot_side: dict = {},
    ax: Axes = None,
    **kws_pie
) → Axes

Pie plot.

Args:

counts (list): counts.
labels (list): labels.
scales_line_xy (tuple, optional): scales for the lines. Defaults to (1.1,1.1).
remove_wedges (list, optional): remove wedge/s. Defaults to None.
remove_wedges_index (list, optional): remove wedge/s by index. Defaults to [].
line_color (str, optional): line color. Defaults to 'k'.
annot_side (bool, optional): annotations on side using the annot_side function. Defaults to False.
kws_annot_side (dict, optional): keyword arguments provided to the annot_side function. Defaults to {}.
ax (plt.Axes, optional): subplot. Defaults to None.

Keyword Args:

kws_pie: keyword arguments provided to the pie chart function.

Returns:

plt.Axes: subplot

References:

https: //matplotlib.org/stable/gallery/pie_and_polar_charts/pie_and_donut_labels.html

`module` `roux.vizi`

`module` `roux.workflow.checks`

For workflow checks.

`function` `grep`

grep(p, checks, exclude=[], exclude_str=[], verbose=True)

Get the output of grep as a list of strings.

`module` `roux.workflow.df`

For management of tables.

`function` `exclude_items`

exclude_items(df1: DataFrame, metadata: dict) → DataFrame

Exclude items from the table with the workflow info.

Args:

df1 (pd.DataFrame): input table.
metadata (dict): metadata of the repository.

Returns:

pd.DataFrame: output.

`module` `roux.workflow.function`

For function management.

`function` `get_quoted_path`

get_quoted_path(s1: str) → str

Quoted paths.

Args:

s1 (str): path.

Returns:

str: quoted path.

`function` `get_path`

get_path(
    s: str,
    validate: bool,
    prefixes=['data/', 'metadata/', 'plot/'],
    test=False
) → str

Extract pathsfrom a line of code.

Args:

s (str): line of code.
validate (bool): validate the output.
prefixes (list, optional): allowed prefixes. Defaults to ['data/','metadata/','plot/'].
test (bool, optional): test mode. Defaults to False.

Returns:

str: path.

TODOs: 1. Use wildcards i.e. *'s.

`function` `remove_dirs_from_outputs`

remove_dirs_from_outputs(outputs: list, test: bool = False) → list

Remove directories from the output paths.

Args:

outputs (list): output paths.
test (bool, optional): test mode. Defaults to False.

Returns:

list: paths.

`function` `get_ios`

get_ios(l: list, test=False) → tuple

Get input and output (IO) paths.

Args:

l (list): list of lines of code.
test (bool, optional): test mode. Defaults to False.

Returns:

tuple: paths of inputs and outputs.

`function` `get_name`

get_name(s: str, i: int, sep_step: str = '## step') → str

Get name of the function.

Args:

s (str): lines in markdown format.
sep_step (str, optional): separator marking the start of a step. Defaults to "## step".
i (int): index of the step.

Returns:

str: name of the function.

`function` `get_step`

get_step(
    l: list,
    name: str,
    sep_step: str = '## step',
    sep_step_end: str = '## tests',
    test=False,
    tab='    '
) → dict

Get code for a step.

Args:

l (list): list of lines of code
name (str): name of the function.
test (bool, optional): test mode. Defaults to False.
tab (str, optional): tab format. Defaults to ' '.

Returns:

dict: step name to code map.

`function` `to_task`

to_task(
    notebookp,
    task=None,
    sep_step: str = '## step',
    sep_step_end: str = '## tests',
    notebook_suffix: str = '_v',
    force=False,
    validate=False,
    path_prefix=None,
    verbose=True,
    test=False
) → str

Get the lines of code for a task (script to be saved as an individual .py file).

Args:

notebookp (type): path of the notebook.
sep_step (str, optional): separator marking the start of a step. Defaults to "## step".
sep_step_end (str, optional): separator marking the end of a step. Defaults to "## tests".
notebook_suffix (str, optional): suffix of the notebook file to be considered as a "task".
force (bool, optional): overwrite output. Defaults to False.
validate (bool, optional): validate output. Defaults to False.
path_prefix (type, optional): prefix to the path. Defaults to None.
verbose (bool, optional): show verbose. Defaults to True.
test (bool, optional): test mode. Defaults to False.

Returns:

str: lines of the code.

`function` `get_global_imports`

get_global_imports() → DataFrame

Get the metadata of the functions imported from from roux import global_imports.

`module` `roux.workflow.io`

For input/output of workflow.

`function` `clear_variables`

clear_variables(dtype=None, variables=None)

Clear dataframes from the workspace.

`function` `clear_dataframes`

clear_dataframes()

`function` `to_py`

to_py(
    notebookp: str,
    pyp: str = None,
    force: bool = False,
    **kws_get_lines
) → str

To python script (.py).

Args:

notebookp (str): path to the notebook path.
pyp (str, optional): path to the python file. Defaults to None.
force (bool, optional): overwrite output. Defaults to False.

Returns:

str: path of the output.

`function` `to_nb_cells`

to_nb_cells(notebook, outp, new_cells, validate_diff=None)

Replace notebook cells.

`function` `import_from_file`

import_from_file(pyp: str)

Import functions from python (.py) file.

Args:

pyp (str): python file (.py).

`function` `infer_parameters`

infer_parameters(input_value, default_value)

Infer the input values and post warning messages.

Parameters:

input_value: the primary value.
default_value: the default/alternative/inferred value.

Returns: Inferred value.

`function` `to_parameters`

to_parameters(f: object, test: bool = False) → dict

Get function to parameters map.

Args:

f (object): function.
test (bool, optional): test mode. Defaults to False.

Returns:

dict: output.

`function` `read_config`

read_config(
    p: str,
    config_base=None,
    inputs=None,
    append_to_key=None,
    convert_dtype: bool = True,
    verbose: bool = True
)

Read configuration.

Parameters:

p (str): input path.
config_base: base config with the inputs for the interpolations

`function` `read_metadata`

read_metadata(
    p: str,
    ind: str = None,
    max_paths: int = 30,
    config_path_key: str = 'config_path',
    config_paths: list = [],
    config_paths_auto=False,
    verbose: bool = False,
    **kws_read_config
) → dict

Read metadata.

Args:

p (str, optional): file containing metadata. Defaults to './metadata.yaml'.
ind (str, optional): directory containing specific setings and other data to be incorporated into metadata. Defaults to './metadata/'.

Returns:

dict: output.

`function` `to_workflow`

to_workflow(df2: DataFrame, workflowp: str, tab: str = '    ') → str

Save workflow file.

Args:

df2 (pd.DataFrame): input table.
workflowp (str): path of the workflow file.
tab (str, optional): tab format. Defaults to ' '.

Returns:

str: path of the workflow file.

`function` `create_workflow_report`

create_workflow_report(workflowp: str, env: str) → int

Create report for the workflow run.

Parameters:

workflowp (str): path of the workflow file (snakemake).
env (str): name of the conda virtual environment where required the workflow dependency is available i.e. snakemake.

`function` `replacestar`

replacestar(
    input_path,
    output_path=None,
    replace_from='from roux.global_imports import *',
    in_place: bool = False,
    attributes={'pandarallel': ['parallel_apply'], 'rd': ['.rd.', '.log.']},
    verbose: bool = False,
    test: bool = False,
    **kws_fix_code
)

Post-development, replace wildcard (global) import from roux i.e. 'from roux.global_imports import *' with individual imports with accompanying documentation.

Parameters input_path (str): path to the .py or .ipynb file. output_path (str): path to the output. py_path (str): path to the intermediate .py file. in_place (bool): whether to carry out the modification in place. return_replacements (bool): return dict with strings to be replaced. attributes (dict): attribute names mapped to their keywords for searching. verbose (bool): verbose toggle. test (bool): test-mode if output file not provided and in-place modification not allowed.

Returns:

output_path (str): path to the modified notebook.

`module` `roux.workflow.knit`

For workflow set up.

`function` `nb_to_py`

nb_to_py(
    notebookp: str,
    test: bool = False,
    validate: bool = True,
    sep_step: str = '## step',
    notebook_suffix: str = '_v'
)

notebook to script.

Args:

notebookp (str): path to the notebook.
sep_step (str, optional): separator marking the start of a step. Defaults to "## step".
notebook_suffix (str, optional): suffix of the notebook file to be considered as a "task".
test (bool, optional): test mode. Defaults to False.
validate (bool, optional): validate. Defaults to True.

TODOs: 1. Add check_outputs parameter to only filter out non-executable code (i.e. tests) if False else edit the code.

`function` `sort_stepns`

sort_stepns(l: list) → list

Sort steps (functions) of a task (script).

Args:

l (list): list of steps.

Returns:

list: sorted list of steps.

`module` `roux.workflow.log`

`function` `print_parameters`

print_parameters(d: dict)

Print a directory with parameters as lines of code

Parameters:

d (dict): directory with parameters

`module` `roux.workflow`

Global Variables

io
log

`module` `roux.workflow.monitor`

For workflow monitors.

`function` `plot_workflow_log`

plot_workflow_log(dplot: DataFrame) → Axes

Plot workflow log.

Args:

dplot (pd.DataFrame): input data (dparam).

Returns:

plt.Axes: output.

TODOs: 1. use the statistics tagged as ## stats.

`module` `roux.workflow.nb`

For operations on jupyter notebooks.

`function` `get_lines`

get_lines(p: str, keep_comments: bool = True) → list

Get lines of code from notebook.

Args:

p (str): path to notebook.
keep_comments (bool, optional): keep comments. Defaults to True.

Returns:

list: lines.

`function` `read_nb_md`

read_nb_md(p: str, n: int = None) → list

Read notebook's documentation in the markdown cells.

Args:

p (str): path of the notebook.
n (int): number of the markdown cells to extract.

Returns:

list: lines of the strings.

`function` `to_info`

to_info(p: str, outp: str, linkd: str = '') → str

Save README.md file.

Args:

p (str, optional): path of the notebook files that would be converted to "tasks".
outp (str, optional): path of the output file, e.g. 'README.md'.

Returns:

str: path of the output file.

`function` `to_replaced_nb`

to_replaced_nb(
    nb_path,
    output_path,
    replaces: dict = {},
    cell_type: str = 'code',
    drop_lines_with_substrings: list = None,
    test=False
)

Replace text in a jupyter notebook.

Parameters nb: notebook object obtained from nbformat.reads. replaces (dict): mapping of text to 'replace from' to the one to 'replace with'. cell_type (str): the type of the cell.

Returns:

new_nb: notebook object.

`function` `to_filtered_nb`

to_filtered_nb(
    p: str,
    outp: str,
    header: str,
    kind: str = 'include',
    validate_diff: int = None
)

Filter sections in a notebook based on markdown headings.

Args:

header (str): exact first line of a markdown cell marking a section in a notebook. validate_diff

`function` `to_filter_nbby_patterns`

to_filter_nbby_patterns(p, outp, patterns=None, **kws)

Filter out notebook cells if the pattern string is found.

Args:

patterns (list): list of string patterns.

`function` `to_clear_unused_cells`

to_clear_unused_cells(
    notebook_path,
    new_notebook_path,
    validate_diff: int = None
)

Remove code cells with all lines commented.

`function` `to_clear_outputs`

to_clear_outputs(notebook_path, new_notebook_path)

`function` `to_filtered_outputs`

to_filtered_outputs(input_path, output_path, warnings=True, strings=True)

`function` `to_diff_notebooks`

to_diff_notebooks(
    notebook_paths,
    url_prefix='https://localhost:8888/nbdime/difftool?',
    remove_prefix='file://',
    verbose=True
) → list

"Diff" notebooks using nbdiff (https://nbdime.readthedocs.io/en/latest/)

Start the nb-diff session by running: nbdiff-web

Todos: 1. Deprecate if functionality added to nbdiff-web.

`module` `roux.workflow.task`

For task management.

`function` `run_task`

run_task(
    parameters: dict,
    input_notebook_path: str,
    kernel: str = None,
    output_notebook_path: str = None,
    test=False,
    verbose=False,
    force=False,
    **kws_papermill
) → str

Run a single task.

Prameters: parameters (dict): parameters including output_paths. input_notebook_path (dict): path to the input notebook which is parameterized. kernel (str): kernel to be used. output_notebook_path: path to the output notebook which is used as a report. test (bool): test-mode. verbose (bool): verbose.

Keyword parameters: kws_papermill: parameters provided to the pm.execute_notebook function.

Returns: Output path.

`function` `run_tasks`

run_tasks(
    input_notebook_path: str,
    kernel: str = None,
    inputs: list = None,
    output_path_base: str = None,
    parameters_list: list = None,
    fast: bool = False,
    fast_workers: int = 6,
    to_filter_nbby_patterns_kws=None,
    input_notebook_temp_path=None,
    out_paths: bool = False,
    test1: bool = False,
    force: bool = False,
    test: bool = False,
    verbose: bool = False,
    **kws_papermill
) → list

Run a list of tasks.

Prameters: input_notebook_path (dict): path to the input notebook which is parameterized. kernel (str): kernel to be used. inputs (list): list of parameters without the output paths, which would be inferred by encoding. output_path_base (str): output path with a placeholder e.g. 'path/to/{KEY}/file'. parameters_list (list): list of parameters including the output paths. fast (bool): enable parallel-processing. fast_workers (bool): number of parallel-processes. force (bool): overwrite the outputs. test (bool): test-mode. verbose (bool): verbose.

Keyword parameters: kws_papermill: parameters provided to the pm.execute_notebook function. to_filter_nbby_patterns_kws (list): dictionary containing parameters to be provided to to_filter_nbby_patterns function (Defaults to None).

Returns:

parameters_list (list): list of parameters including the output paths, inferred if not provided.

TODOs: 0. Ignore temporary parameters e.g test, verbose etc while encoding inputs. 1. Integrate with apply_on_paths for parallel processing etc.

Notes:

To resolve RuntimeError: This event loop is already running in python from multiprocessing, execute import nest_asyncio nest_asyncio.apply()

`module` `roux.workflow.version`

For version control.

`function` `git_commit`

git_commit(repop: str, suffix_message: str = '', force=False)

Version control.

Args:

repop (str): path to the repository.
suffix_message (str, optional): add suffix to the version (commit) message. Defaults to ''.

`module` `roux.workflow.workflow`

For workflow management.

`function` `get_scripts`

get_scripts(
    ps: list,
    notebook_prefix: str = '\\d{2}',
    notebook_suffix: str = '_v\\d{2}',
    test: bool = False,
    fast: bool = True,
    cores: int = 6,
    force: bool = False,
    tab: str = '    ',
    **kws
) → DataFrame

Get scripts.

Args:

ps (list): paths.
notebook_prefix (str, optional): prefix of the notebook file to be considered as a "task".
notebook_suffix (str, optional): suffix of the notebook file to be considered as a "task".
test (bool, optional): test mode. Defaults to False.
fast (bool, optional): parallel processing. Defaults to True.
cores (int, optional): cores to use. Defaults to 6.
force (bool, optional): overwrite the outputs. Defaults to False.
tab (str, optional): tab in spaces. Defaults to ' '.

Returns:

pd.DataFrame: output table.

`function` `to_scripts`

to_scripts(
    packagep: str,
    notebooksdp: str,
    validate: bool = False,
    ps: list = None,
    notebook_prefix: str = '\\d{2}',
    notebook_suffix: str = '_v\\d{2}',
    scripts: bool = True,
    workflow: bool = True,
    sep_step: str = '## step',
    todos: bool = False,
    git: bool = True,
    clean: bool = False,
    test: bool = False,
    force: bool = True,
    tab: str = '    ',
    **kws
)

To scripts.

Args:

# packagen (str): package name.
packagep (str): path to the package.
notebooksdp (str, optional): path to the notebooks. Defaults to None.
validate (bool, optional): validate if functions are formatted correctly. Defaults to False.
ps (list, optional): paths. Defaults to None.
notebook_prefix (str, optional): prefix of the notebook file to be considered as a "task".
notebook_suffix (str, optional): suffix of the notebook file to be considered as a "task".
scripts (bool, optional): make scripts. Defaults to True.
workflow (bool, optional): make workflow file. Defaults to True.
sep_step (str, optional): separator marking the start of a step. Defaults to "## step".
todos (bool, optional): show todos. Defaults to False.
git (bool, optional): save version. Defaults to True.
clean (bool, optional): clean temporary files. Defaults to False.
test (bool, optional): test mode. Defaults to False.
force (bool, optional): overwrite outputs. Defaults to True.
tab (str, optional): tab size. Defaults to ' '.

Keyword parameters:

kws: parameters provided to the get_script function, including sep_step and sep_step_end

TODOs:

1. For version control, use https: //github.com/jupyterlab/jupyterlab-git.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Feb 12, 2024

0.1.0

Nov 27, 2023

0.0.9

Aug 10, 2023

0.0.8

Mar 6, 2023

0.0.7

Nov 28, 2022

0.0.6

Aug 13, 2022

0.0.5

Jul 7, 2022

0.0.4

Jun 2, 2022

0.0.3

May 2, 2022

0.0.2

Feb 8, 2022

0.0.1

Jan 23, 2022

0.0.0

Nov 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roux-0.1.1.tar.gz (238.6 kB view hashes)

Uploaded Feb 12, 2024 Source

Built Distribution

roux-0.1.1-py3-none-any.whl (219.4 kB view hashes)

Uploaded Feb 12, 2024 Python 3

Hashes for roux-0.1.1.tar.gz

Hashes for roux-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`08a83a7efbd986a7ac02793bcc5770a7486266dbc60faffb8bade979b9f936bf`
MD5	`6bad6fff5f9668d108ce59f819009c38`
BLAKE2b-256	`ca1be010f8ff2b58ac583dbf47a999c54af7755d587022989d5b5d7ed4c1c883`

Hashes for roux-0.1.1-py3-none-any.whl

Hashes for roux-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce7200b8c18840585a165b1d5996d4c22d1316bebbcbcb09cdb4197cae67aca4`
MD5	`a6abe5cd83bad12a08e23bbd61a0d541`
BLAKE2b-256	`8933423003716b32f2f4268763ed9f91047b725c1e7e50f74c81569e6da554a6`

roux 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

roux

Examples

Installation

Command-line usage

How to cite?

Future directions, for which contributions are welcome

Similar projects

API

module roux.global_imports

Global Variables

module roux.lib.df

function get_name

function get_groupby_columns

function get_constants

function drop_unnamedcol

function drop_unnamedcol

function drop_levelcol

function drop_constants

function dropby_patterns

function flatten_columns

function lower_columns

function renameby_replace

function clean_columns

function clean

function compress

function clean_compress

function check_na

function validate_no_na

function assert_no_na

function to_str

function check_nunique

function check_inflation

function check_dups

function check_duplicated

function validate_no_dups

function validate_no_duplicates

function assert_no_dups

function validate_dense

function assert_dense

function classify_mappings

function check_mappings

function assert_1_1_mappings

function get_mappings

function to_map_binary

function check_intersections

function get_totals

function filter_rows

function get_bools

function agg_bools

function melt_paired

function get_chunks

function sample_near_quantiles

function get_group

function groupby_sample

function groupby_agg_nested

function groupby_filter_fast

function infer_index

function to_multiindex_columns

function to_ranges

function to_boolean

function to_cat

function astype_cat

function sort_valuesby_list

function agg_by_order

function agg_by_order_counts

function groupby_sort_values

function groupby_sort_values

function swap_paired_cols

function sort_columns_by_values

function make_ids

function make_ids_sorted

`module` `roux.global_imports`

`module` `roux.lib.df`

`function` `get_name`

`function` `get_groupby_columns`

`function` `get_constants`

`function` `drop_unnamedcol`

`function` `drop_unnamedcol`

`function` `drop_levelcol`

`function` `drop_constants`

`function` `dropby_patterns`

`function` `flatten_columns`

`function` `lower_columns`

`function` `renameby_replace`

`function` `clean_columns`

`function` `clean`

`function` `compress`

`function` `clean_compress`

`function` `check_na`

`function` `validate_no_na`

`function` `assert_no_na`

`function` `to_str`

`function` `check_nunique`

`function` `check_inflation`

`function` `check_dups`

`function` `check_duplicated`

`function` `validate_no_dups`

`function` `validate_no_duplicates`

`function` `assert_no_dups`

`function` `validate_dense`

`function` `assert_dense`

`function` `classify_mappings`

`function` `check_mappings`

`function` `assert_1_1_mappings`

`function` `get_mappings`

`function` `to_map_binary`

`function` `check_intersections`

`function` `get_totals`

`function` `filter_rows`

`function` `get_bools`

`function` `agg_bools`

`function` `melt_paired`

`function` `get_chunks`

`function` `sample_near_quantiles`

`function` `get_group`

`function` `groupby_sample`

`function` `groupby_agg_nested`

`function` `groupby_filter_fast`

`function` `infer_index`

`function` `to_multiindex_columns`

`function` `to_ranges`

`function` `to_boolean`

`function` `to_cat`

`function` `astype_cat`

`function` `sort_valuesby_list`

`function` `agg_by_order`

`function` `agg_by_order_counts`

`function` `groupby_sort_values`

`function` `groupby_sort_values`

`function` `swap_paired_cols`

`function` `sort_columns_by_values`

`function` `make_ids`

`function` `make_ids_sorted`

`function` `get_alt_id`

`function` `split_ids`

`function` `dict2df`

`function` `log_shape_change`

`function` `log_apply`

`class` `log`

`method` `init`

`method` `check_dups`

`method` `check_na`

`method` `clean`

`method` `drop`

`method` `drop_duplicates`

`method` `dropna`

`method` `explode`

`method` `filter_`

`method` `filter_rows`

`method` `groupby`

`method` `join`