Convenience functions.
Project description
roux
Convenience functions in Python.
Examples
·
Explore the API
Examples
⌗ Dataframes.
⌗⌗ Paired Dataframes.
💾 General Input/Output.
⬤⬤ Sets.
🔤 Strings encoding/decoding.
🗃 File paths Input/Output.
🏷 Classification.
✨ Clustering.
✨ Correlations.
✨ Differences.
📈 Data fitting.
📊 Data normalization.
⬤⬤ Comparison between sets.
📈🔖Annotating visualisations.
🔧 Subplot-level adjustments.
📈 Diagrams.
📈 Distribution plots.
📈 Wrapper around Series plotting functions.
📈📈Annotating figure.
📈💾 Visualizations Input/Output.
📈 Line plots.
📈 Scatter plots.
📈⬤⬤ Plots of sets.
📈🎨✨ Visualizations theming.
⚙️🗺️ Reading multiple configs.
⚙️⏩ Running multiple tasks.
⚙️⏩ Workflow using notebooks
Installation
pip install roux # with basic dependencies
pip install roux[all] # with all the additional dependencies (recommended).
With additional dependencies as required:
pip install roux[viz] # for visualizations e.g. seaborn etc.
pip install roux[data] # for data operations e.g. reading excel files etc.
pip install roux[stat] # for statistics e.g. statsmodels etc.
pip install roux[fast] # for faster processing e.g. parallelization etc.
pip install roux[workflow] # for workflow operations e.g. omegaconf etc.
pip install roux[interactive] # for interactive operations in jupyter notebook e.g. watermark, icecream etc.
Command-line usage
ℹ️ Available command line tools and their usage.
roux --help
⭐ Remove *'s from a jupyter notebook'.
roux removestar path/to/notebook
🗺️ Read configuration.
roux read-config path/to/file
🗺️ Read metadata.
roux read-metadata path/to/file
📁 Find the latest and the oldest file in a list.
roux read-ps list_of_paths
💾 Backup a directory with a timestamp (ISO).
roux backup path/to/directory
How to cite?
- Using BibTeX:
@software{Dandage_roux,
title = {roux: Streamlined and Versatile Data Processing Toolkit},
author = {Dandage, Rohan},
year = {2024},
url = {https://zenodo.org/doi/10.5281/zenodo.2682670},
version = {0.1.2},
note = {The URL is a DOI link to the permanent archive of the software.},
}
-
Using citation information from CITATION.CFF file.
Future directions, for which contributions are welcome
- Addition of visualization function as attributes to
rd
dataframes. - Refactoring of the workflow functions.
Similar projects
API
module roux.viz.compare
For comparative plots.
function plot_comparisons
plot_comparisons(
plot_data,
x,
ax=None,
output_dir_path=None,
force=False,
return_path=False
)
Parameters:
plot_data
: output of.stat.compare.get_comparison
Notes:
sample type
: different sample of the same data.
module roux.stat.cluster
For clustering data.
function check_clusters
check_clusters(df: DataFrame)
Check clusters.
Args:
df
(DataFrame): dataframe.
function get_clusters
get_clusters(
X: <built-in function array>,
n_clusters: int,
random_state=88,
params={},
test=False
) → dict
Get clusters.
Args:
X
(np.array): vectorn_clusters
(int): intrandom_state
(int, optional): random state. Defaults to 88.params
(dict, optional): parameters for theMiniBatchKMeans
function. Defaults to {}.test
(bool, optional): test. Defaults to False.
Returns: dict:
function get_n_clusters_optimum
get_n_clusters_optimum(df5: DataFrame, test=False) → int
Get n clusters optimum.
Args:
df5
(DataFrame): input dataframetest
(bool, optional): test. Defaults to False.
Returns:
int
: knee point.
function plot_silhouette
plot_silhouette(df: DataFrame, n_clusters_optimum=None, ax=None)
Plot silhouette
Args:
df
(DataFrame): input dataframe.n_clusters_optimum
(int, optional): number of clusters. Defaults to None:int.ax
(axes, optional): axes object. Defaults to None:axes.
Returns:
ax
(axes, optional): axes object. Defaults to None:axes.
function get_clusters_optimum
get_clusters_optimum(
X: <built-in function array>,
n_clusters=range(2, 11),
params_clustering={},
test=False
) → dict
Get optimum clusters.
Args:
X
(np.array): samples to cluster in indexed format.n_clusters
(int, optional): description. Defaults to range(2,11).params_clustering
(dict, optional): parameters provided toget_clusters
. Defaults to {}.test
(bool, optional): test. Defaults to False.
Returns:
dict
: description
function get_gmm_params
get_gmm_params(g, x, n_clusters=2, test=False)
Intersection point of the two peak Gaussian mixture Models (GMMs).
Args:
out
(str):coff
only orparams
for all the parameters.
function get_gmm_intersection
get_gmm_intersection(x, two_pdfs, means, weights, test=False)
function cluster_1d
cluster_1d(
ds: Series,
n_clusters: int,
clf_type='gmm',
random_state=1,
test=False,
returns=['coff'],
**kws_clf
) → dict
Cluster 1D data.
Args:
ds
(Series): series.n_clusters
(int): number of clusters.clf_type
(str, optional): type of classification. Defaults to 'gmm'.random_state
(int, optional): random state. Defaults to 88.test
(bool, optional): test. Defaults to False.returns
(list, optional): return format. Defaults to ['df','coff','ax','model'].ax
(axes, optional): axes object. Defaults to None.
Raises:
ValueError
: clf_type
Returns:
dict
: description
function get_pos_umap
get_pos_umap(df1, spread=100, test=False, k='', **kws) → DataFrame
Get positions of the umap points.
Args:
df1
(DataFrame): input dataframespread
(int, optional): spead extent. Defaults to 100.test
(bool, optional): test. Defaults to False.k
(str, optional): number of clusters. Defaults to ''.
Returns:
DataFrame
: output dataframe.
module roux.workflow.version
For version control.
function git_commit
git_commit(repop: str, suffix_message: str = '', force=False)
Version control.
Args:
repop
(str): path to the repository.suffix_message
(str, optional): add suffix to the version (commit) message. Defaults to ''.
module roux.workflow.log
function print_parameters
print_parameters(d: dict)
Print a directory with parameters as lines of code
Parameters:
d
(dict): directory with parameters
function test_params
test_params(params, i=0)
module roux.workflow.io
For input/output of workflow.
function clear_variables
clear_variables(dtype=None, variables=None)
Clear dataframes from the workspace.
function clear_dataframes
clear_dataframes()
function to_py
to_py(
notebookp: str,
pyp: str = None,
force: bool = False,
**kws_get_lines
) → str
To python script (.py).
Args:
notebookp
(str): path to the notebook path.pyp
(str, optional): path to the python file. Defaults to None.force
(bool, optional): overwrite output. Defaults to False.
Returns:
str
: path of the output.
function to_nb_cells
to_nb_cells(notebook, outp, new_cells, validate_diff=None)
Replace notebook cells.
function import_from_file
import_from_file(pyp: str)
Import functions from python (.py
) file.
Args:
pyp
(str): python file (.py
).
function infer_parameters
infer_parameters(input_value, default_value)
Infer the input values and post warning messages.
Parameters:
input_value
: the primary value.default_value
: the default/alternative/inferred value.
Returns: Inferred value.
function to_parameters
to_parameters(f: object, test: bool = False) → dict
Get function to parameters map.
Args:
f
(object): function.test
(bool, optional): test mode. Defaults to False.
Returns:
dict
: output.
function read_config
read_config(
p: str,
config_base=None,
inputs=None,
append_to_key=None,
convert_dtype: bool = True,
verbose: bool = True
)
Read configuration.
Parameters:
p
(str): input path.config_base
: base config with the inputs for the interpolations
function read_metadata
read_metadata(
p: str,
ind: str = None,
max_paths: int = 30,
config_path_key: str = 'config_path',
config_paths: list = [],
config_paths_auto=False,
verbose: bool = False,
**kws_read_config
) → dict
Read metadata.
Args:
p
(str, optional): file containing metadata. Defaults to './metadata.yaml'.ind
(str, optional): directory containing specific setings and other data to be incorporated into metadata. Defaults to './metadata/'.
Returns:
dict
: output.
function to_workflow
to_workflow(df2: DataFrame, workflowp: str, tab: str = ' ') → str
Save workflow file.
Args:
df2
(pd.DataFrame): input table.workflowp
(str): path of the workflow file.tab
(str, optional): tab format. Defaults to ' '.
Returns:
str
: path of the workflow file.
function create_workflow_report
create_workflow_report(workflowp: str, env: str) → int
Create report for the workflow run.
Parameters:
workflowp
(str): path of the workflow file (snakemake
).env
(str): name of the conda virtual environment where required the workflow dependency is available i.e.snakemake
.
function replacestar
replacestar(
input_path,
output_path=None,
replace_from='from roux.global_imports import *',
in_place: bool = False,
attributes={'pandarallel': ['parallel_apply'], 'rd': ['.rd.', '.log.']},
verbose: bool = False,
test: bool = False,
**kws_fix_code
)
Post-development, replace wildcard (global) import from roux i.e. 'from roux.global_imports import *' with individual imports with accompanying documentation.
Usage: For notebooks developed using roux.global_imports.
Parameters input_path (str): path to the .py or .ipynb file. output_path (str): path to the output. py_path (str): path to the intermediate .py file. in_place (bool): whether to carry out the modification in place. return_replacements (bool): return dict with strings to be replaced. attributes (dict): attribute names mapped to their keywords for searching. verbose (bool): verbose toggle. test (bool): test-mode if output file not provided and in-place modification not allowed.
Returns:
output_path
(str): path to the modified notebook.
Examples: roux replacestar -i notebook.ipynb roux replacestar -i notebooks/*.ipynb
function replacestar_ruff
replacestar_ruff(
p: str,
outp: str,
replace: str = 'from roux.global_imports import *',
verbose=True
) → str
function post_code
post_code(p: str, lint: bool, format: bool, verbose: bool = True)
function to_clean_nb
to_clean_nb(
p,
outp: str = None,
in_place: bool = False,
temp_outp: str = None,
clear_outputs=False,
drop_code_lines_containing=['.*%run .*', '^#\\s*.*=.*', '^#\\s*".*', "^#\\s*'.*", '^#\\s*f".*', "^#\\s*f'.*", '^#\\s*df.*', '^#\\s*.*kws_.*', '^\\s*#\\s*$', '^\\s*#\\s*break\\s*$', '\\[X', '\\[old ', '#old', '# old', '\\[not used', '# not used', '#tmp', '# tmp', '#temp', '# temp', 'check ', 'checking', '# check', '\\[SKIP', 'DEBUG '],
drop_headers_containing=['check', '[check', 'old', '[old', 'tmp', '[tmp'],
lint=False,
format=False,
**kws_fix_code
) → str
Wraper around the notebook post-processing functions.
Usage: For notebooks developed using roux.global_imports.
On command line:
single input roux to-clean-nb in.ipynb out.ipynb -c -l -f
multiple inputs roux to-clean-nb "in*.ipynb" -i -c -l -f
Parameters:
temp_outp
(str): path to the intermediate output.
module roux.viz.image
For visualization of images.
function plot_image
plot_image(
imp: str,
ax: Axes = None,
force=False,
margin=0,
axes=False,
test=False,
**kwarg
) → Axes
Plot image e.g. schematic.
Args:
imp
(str): path of the image.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.force
(bool, optional): overwrite output. Defaults to False.margin
(int, optional): margins. Defaults to 0.test
(bool, optional): test mode. Defaults to False.
Returns:
plt.Axes
:plt.Axes
object.
:param kwarg: cairosvg: {'dpi':500,'scale':2}; imagemagick: {'trim':False,'alpha':False}
function plot_images
plot_images(image_paths, ncols=3, title_func=None, size=3)
module roux.lib.sys
For processing file paths for example.
function basenamenoext
basenamenoext(p)
Basename without the extension.
Args:
p
(str): path.
Returns:
s
(str): output.
function remove_exts
remove_exts(p: str)
Filename without the extension.
Args:
p
(str): path.
Returns:
s
(str): output.
function read_ps
read_ps(ps, test: bool = True, verbose: bool = True) → list
Read a list of paths.
Parameters:
ps
(list|str): list of paths or a string with wildcard/s.test
(bool): testing.verbose
(bool): verbose.
Returns:
ps
(list): list of paths.
function to_path
to_path(s, replacewith='_', verbose=False, coff_len_escape_replacement=100)
Normalise a string to be used as a path of file.
Parameters:
s
(string): input string.replacewith
(str): replace the whitespaces or incompatible characters with.
Returns:
s
(string): output string.
function to_path
to_path(s, replacewith='_', verbose=False, coff_len_escape_replacement=100)
Normalise a string to be used as a path of file.
Parameters:
s
(string): input string.replacewith
(str): replace the whitespaces or incompatible characters with.
Returns:
s
(string): output string.
function makedirs
makedirs(p: str, exist_ok=True, **kws)
Make directories recursively.
Args:
p
(str): path.exist_ok
(bool, optional): no error if the directory exists. Defaults to True.
Returns:
p_
(str): the path of the directory.
function to_output_path
to_output_path(ps, outd=None, outp=None, suffix='')
Infer a single output path for a list of paths.
Parameters:
ps
(list): list of paths.outd
(str): path of the output directory.outp
(str): path of the output file.suffix
(str): suffix of the filename.
Returns:
outp
(str): path of the output file.
function to_output_paths
to_output_paths(
input_paths: list = None,
inputs: list = None,
output_path_base: str = None,
encode_short: bool = True,
replaces_output_path=None,
key_output_path: str = None,
force: bool = False,
verbose: bool = False
) → dict
Infer a output path for each of the paths or inputs.
Parameters:
input_paths (list)
: list of input paths. Defaults to None.inputs (list)
: list of inputs e.g. dictionaries. Defaults to None.output_path_base (str)
: output path with a placeholder '{KEY}' to be replaced. Defaults to None.encode_short
: (bool) : short encoded string, else long encoded string (reversible) is used. Defaults to True.replaces_output_path
: list, dictionary or function to replace the input paths. Defaults to None.key_output_path (str)
: key to be used to incorporate output_path variable among the inputs. Defaults to None.force
(bool): overwrite the outputs. Defaults to False.verbose (bool)
: show verbose. Defaults to False.
Returns: dictionary with the output path mapped to input paths or inputs.
TODOs: 1. Placeholders other than {KEY}.
function get_encoding
get_encoding(p)
Get encoding of a file.
Parameters:
p
(str): file path
Returns:
s
(string): encoding.
function get_all_subpaths
get_all_subpaths(d='.', include_directories=False)
Get all the subpaths.
Args:
d
(str, optional): description. Defaults to '.'.include_directories
(bool, optional): to include the directories. Defaults to False.
Returns:
paths
(list): sub-paths.
function get_env
get_env(env_name: str, return_path: bool = False)
Get the virtual environment as a dictionary.
Args:
env_name
(str): name of the environment.
Returns:
d
(dict): parameters of the virtual environment.
function run_com
run_com(com: str, env=None, test: bool = False, **kws)
Run a bash command.
Args:
com
(str): command.env
(str): environment name.test
(bool, optional): testing. Defaults to False.
Returns:
output
: output of thesubprocess.call
function.
TODOs: 1. logp 2. error ignoring
function run_com
run_com(com: str, env=None, test: bool = False, **kws)
Run a bash command.
Args:
com
(str): command.env
(str): environment name.test
(bool, optional): testing. Defaults to False.
Returns:
output
: output of thesubprocess.call
function.
TODOs: 1. logp 2. error ignoring
function runbash_tmp
runbash_tmp(
s1: str,
env: str,
df1=None,
inp='INPUT',
input_type='df',
output_type='path',
tmp_infn='in.txt',
tmp_outfn='out.txt',
outp=None,
force=False,
test=False,
**kws
)
Run a bash command in /tmp
directory.
Args:
s1
(str): command.env
(str): environment name.df1
(DataFrame, optional): input dataframe. Defaults to None.inp
(str, optional): input path. Defaults to 'INPUT'.input_type
(str, optional): input type. Defaults to 'df'.output_type
(str, optional): output type. Defaults to 'path'.tmp_infn
(str, optional): temporary input file. Defaults to 'in.txt'.tmp_outfn
(str, optional): temporary output file.. Defaults to 'out.txt'.outp
(type, optional): output path. Defaults to None.force
(bool, optional): force. Defaults to False.test
(bool, optional): test. Defaults to False.
Returns:
output
: output of thesubprocess.call
function.
function create_symlink
create_symlink(p: str, outp: str, test=False, force=False)
Create symbolic links.
Args:
p
(str): input path.outp
(str): output path.test
(bool, optional): test. Defaults to False.
Returns:
outp
(str): output path.
TODOs:
Use
pathlib``:Path(p).symlink_to(Path(outp))
function input_binary
input_binary(q: str)
Get input in binary format.
Args:
q
(str): question.
Returns:
b
(bool): response.
function is_interactive
is_interactive()
Check if the UI is interactive e.g. jupyter or command line.
function is_interactive_notebook
is_interactive_notebook()
Check if the UI is interactive e.g. jupyter or command line.
Notes:
Reference:
function get_excecution_location
get_excecution_location(depth=1)
Get the location of the function being executed.
Args:
depth
(int, optional): Depth of the location. Defaults to 1.
Returns:
tuple
(tuple): filename and line number.
function get_datetime
get_datetime(outstr: bool = True, fmt='%G%m%dT%H%M%S')
Get the date and time.
Args:
outstr
(bool, optional): string output. Defaults to True.fmt
(str): format of the string.
Returns:
s
: date and time.
function p2time
p2time(filename: str, time_type='m')
Get the creation/modification dates of files.
Args:
filename
(str): filename.time_type
(str, optional): description. Defaults to 'm'.
Returns:
time
(str): time.
function ps2time
ps2time(ps: list, **kws_p2time)
Get the times for a list of files.
Args:
ps
(list): list of paths.
Returns:
ds
(Series): paths mapped to corresponding times.
function get_logger
get_logger(program='program', argv=None, level=None, dp=None)
Get the logging object.
Args:
program
(str, optional): name of the program. Defaults to 'program'.argv
(type, optional): arguments. Defaults to None.level
(type, optional): level of logging. Defaults to None.dp
(type, optional): description. Defaults to None.
function tree
tree(folder_path: str, log=True)
function grep
grep(
p: str,
checks: list,
exclude: list = [],
exclude_str: list = [],
verbose: bool = True
) → list
To get the output of grep as a list of strings.
Parameters:
p
(str): input path
module roux.stat.transform
For transformations.
function plog
plog(x, p: float, base: int)
Psudo-log.
Args:
x
(float|np.array): input.p
(float): pseudo-count.base
(int): base of the log.
Returns: output.
function anti_plog
anti_plog(x, p: float, base: int)
Anti-psudo-log.
Args:
x
(float|np.array): input.p
(float): pseudo-count.base
(int): base of the log.
Returns: output.
function log_pval
log_pval(
x,
errors: str = 'raise',
replace_zero_with: float = None,
p_min: float = None
)
Transform p-values to Log10.
Paramters: x: input. errors (str): Defaults to 'raise' else replace (in case of visualization only). p_min (float): Replace zeros with this value. Note: to be used for visualization only.
Returns: output.
function get_q
get_q(ds1: Series, col: str = None, verb: bool = True, test_coff: float = 0.1)
To FDR corrected P-value.
function glog
glog(x: float, l=2)
Generalised logarithm.
Args:
x
(float): input.l
(int, optional): psudo-count. Defaults to 2.
Returns:
float
: output.
function rescale
rescale(
a: <built-in function array>,
range1: tuple = None,
range2: tuple = [0, 1]
) → <built-in function array>
Rescale within a new range.
Args:
a
(np.array): input vector.range1
(tuple, optional): existing range. Defaults to None.range2
(tuple, optional): new range. Defaults to [0,1].
Returns:
np.array
: output.
function rescale_divergent
rescale_divergent(df1: DataFrame, col: str, col_sign: str = None) → DataFrame
Rescale divergently i.e. two-sided.
Args:
df1
(pd.DataFrame): descriptioncol
(str): column.
Returns:
pd.DataFrame
: column.
Notes:
Under development.
module roux.lib.ds
For processing pandas Series.
function get_near_quantile
get_near_quantile(x: Series, q: float)
Retrieve the nearest value to a quantile.
module roux.viz.dist
For distribution plots.
function hist_annot
hist_annot(
dplot: DataFrame,
colx: str,
colssubsets: list = [],
bins: int = 100,
subset_unclassified: bool = True,
cmap: str = 'hsv',
ymin=None,
ymax=None,
ylimoff: float = 1,
ywithinoff: float = 1.2,
annotaslegend: bool = True,
annotn: bool = True,
params_scatter: dict = {'zorder': 2, 'alpha': 0.1, 'marker': '|'},
xlim: tuple = None,
ax: Axes = None,
**kws
) → Axes
Annoted histogram.
Args:
dplot
(pd.DataFrame): input dataframe.colx
(str): x column.colssubsets
(list, optional): columns indicating subsets. Defaults to [].bins
(int, optional): bins. Defaults to 100.subset_unclassified
(bool, optional): call non-annotated subset as 'unclassified'. Defaults to True.cmap
(str, optional): colormap. Defaults to 'Reds_r'.ylimoff
(float, optional): y-offset for y-axis limit . Defaults to 1.2.ywithinoff
(float, optional): y-offset for the distance within labels. Defaults to 1.2.annotaslegend
(bool, optional): convert labels to legends. Defaults to True.annotn
(bool, optional): annotate sample sizes. Defaults to True.params_scatter
(type, optional): parameters of the scatter plot. Defaults to {'zorder':2,'alpha':0.1,'marker':'|'}.xlim
(tuple, optional): x-axis limits. Defaults to None.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to thehist
function.
Returns:
plt.Axes
:plt.Axes
object.
TODOs: For scatter, use annot_side
with loc='top'
.
function plot_gmm
plot_gmm(
x: Series,
coff: float = None,
mix_pdf: object = None,
two_pdfs: tuple = None,
weights: tuple = None,
n_clusters: int = 2,
bins: int = 20,
show_cutoff: bool = True,
show_cutoff_line: bool = True,
colors: list = ['gray', 'gray', 'lightgray'],
out_coff: bool = False,
hist: bool = True,
test: bool = False,
ax: Axes = None,
kws_axvline={'color': 'k'},
**kws
) → Axes
Plot Gaussian mixture Models (GMMs).
Args:
x
(pd.Series): input vector.coff
(float, optional): intersection between two fitted distributions. Defaults to None.mix_pdf
(object, optional): Probability density function of the mixed distribution. Defaults to None.two_pdfs
(tuple, optional): Probability density functions of the separate distributions. Defaults to None.weights
(tuple, optional): weights of the individual distributions. Defaults to None.n_clusters
(int, optional): number of distributions. Defaults to 2.bins
(int, optional): bins. Defaults to 50.colors
(list, optional): colors of the invividual distributions and of the mixed one. Defaults to ['gray','gray','lightgray']. 'gray'out_coff
(bool,False): return the cutoff. Defaults to False.hist
(bool, optional): show histogram. Defaults to True.test
(bool, optional): test mode. Defaults to False.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to thehist
function.kws_axvline
: parameters provided to theaxvline
function.
Returns:
plt.Axes
:plt.Axes
object.
function plot_normal
plot_normal(x: Series, ax: Axes = None) → Axes
Plot normal distribution.
Args:
x
(pd.Series): input vector.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
function get_jitter_positions
get_jitter_positions(ax, df1, order, column_category, column_position)
function plot_dists
plot_dists(
df1: DataFrame,
x: str,
y: str,
colindex: str,
hue: str = None,
order: list = None,
hue_order: list = None,
kind: str = 'box',
show_p: bool = True,
show_n: bool = True,
show_n_prefix: str = '',
show_n_ha=None,
show_n_ticklabels: bool = True,
show_outlines: bool = False,
kws_outlines: dict = {},
alternative: str = 'two-sided',
offx_n: float = 0,
axis_cont_lim: tuple = None,
axis_cont_scale: str = 'linear',
offs_pval: dict = None,
fmt_pval: str = '<',
alpha: float = 0.5,
ax: Axes = None,
test: bool = False,
kws_stats: dict = {},
**kws
) → Axes
Plot distributions.
Args:
df1
(pd.DataFrame): input data.x
(str): x column.y
(str): y column.colindex
(str): index column.hue
(str, optional): column with values to be encoded as hues. Defaults to None.order
(list, optional): order of categorical values. Defaults to None.hue_order
(list, optional): order of values to be encoded as hues. Defaults to None.kind
(str, optional): kind of distribution. Defaults to 'box'.show_p
(bool, optional): show p-values. Defaults to True.show_n
(bool, optional): show sample sizes. Defaults to True.show_n_prefix
(str, optional): show prefix of sample size label i.e.n=
. Defaults to ''.offx_n
(float, optional): x-offset for the sample size label. Defaults to 0.axis_cont_lim
(tuple, optional): x-axis limits. Defaults to None.offs_pval
(float, optional): x and y offsets for the p-value labels.# saturate_color_alpha (float, optional)
: saturation of the color. Defaults to 1.5.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.test
(bool, optional): test mode. Defaults to False.kws_stats
(dict, optional): parameters provided to the stat function. Defaults to {}.
Keyword Args:
kws
: parameters provided to theseaborn
function.
Returns:
plt.Axes
:plt.Axes
object.
TODOs: 1. Sort categories. 2. Change alpha of the boxplot rather than changing saturation of the swarmplot.
function pointplot_groupbyedgecolor
pointplot_groupbyedgecolor(data: DataFrame, ax: Axes = None, **kws) → Axes
Plot seaborn's pointplot
grouped by edgecolor of points.
Args:
data
(pd.DataFrame): input data.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to theseaborn
'spointplot
function.
Returns:
plt.Axes
:plt.Axes
object.
module roux.viz.theme
Theming.
function set_theme
set_theme(
font: str = 'Myriad Pro',
fontsize: int = 12,
pad: int = 2,
palette: list = ['#50AADC', '#D3DDDC', '#F1D929', '#f55f5f', '#046C9A', '#00A08A', '#F2AD00', '#F98400', '#5BBCD6', '#ECCBAE', '#D69C4E', '#ABDDDE', '#000000']
)
Set the theme.
Parameters:
font
(str): font name.fontsize
(int): font size.pad
(int): padding.
TODOs: Addition of palette
options.
module roux.workflow.workflow
For workflow management.
function get_scripts
get_scripts(
ps: list,
notebook_prefix: str = '\\d{2}',
notebook_suffix: str = '_v\\d{2}',
test: bool = False,
fast: bool = True,
cores: int = 6,
force: bool = False,
tab: str = ' ',
**kws
) → DataFrame
Get scripts.
Args:
ps
(list): paths.notebook_prefix
(str, optional): prefix of the notebook file to be considered as a "task".notebook_suffix
(str, optional): suffix of the notebook file to be considered as a "task".test
(bool, optional): test mode. Defaults to False.fast
(bool, optional): parallel processing. Defaults to True.cores
(int, optional): cores to use. Defaults to 6.force
(bool, optional): overwrite the outputs. Defaults to False.tab
(str, optional): tab in spaces. Defaults to ' '.
Returns:
pd.DataFrame
: output table.
function to_scripts
to_scripts(
packagep: str,
notebooksdp: str,
validate: bool = False,
ps: list = None,
notebook_prefix: str = '\\d{2}',
notebook_suffix: str = '_v\\d{2}',
scripts: bool = True,
workflow: bool = True,
sep_step: str = '## step',
todos: bool = False,
git: bool = True,
clean: bool = False,
test: bool = False,
force: bool = True,
tab: str = ' ',
**kws
)
To scripts.
Args:
# packagen (str)
: package name.packagep
(str): path to the package.notebooksdp
(str, optional): path to the notebooks. Defaults to None.validate
(bool, optional): validate if functions are formatted correctly. Defaults to False.ps
(list, optional): paths. Defaults to None.notebook_prefix
(str, optional): prefix of the notebook file to be considered as a "task".notebook_suffix
(str, optional): suffix of the notebook file to be considered as a "task".scripts
(bool, optional): make scripts. Defaults to True.workflow
(bool, optional): make workflow file. Defaults to True.sep_step
(str, optional): separator marking the start of a step. Defaults to "## step".todos
(bool, optional): show todos. Defaults to False.git
(bool, optional): save version. Defaults to True.clean
(bool, optional): clean temporary files. Defaults to False.test
(bool, optional): test mode. Defaults to False.force
(bool, optional): overwrite outputs. Defaults to True.tab
(str, optional): tab size. Defaults to ' '.
Keyword parameters:
kws
: parameters provided to theget_script
function, includingsep_step
andsep_step_end
TODOs:
1. For version control, use https
: //github.com/jupyterlab/jupyterlab-git.
module roux.stat
Global Variables
- binary
- io
module roux.lib.io
For input/output of data files.
function read_zip
read_zip(p: str, file_open: str = None, fun_read=None, test: bool = False)
Read the contents of a zip file.
Parameters:
p
(str): path of the file.file_open
(str): path of file within the zip file to open.fun_read
(object): function to read the file.
Examples:
- Setting
fun_read
parameter for reading tab-separated table from a zip file.
from io import StringIO ... fun_read=lambda x: pd.read_csv(io.StringIO(x.decode('utf-8')),sep=' ',header=None),
or
from io import BytesIO ... fun_read=lambda x: pd.read_table(BytesIO(x)),
function to_zip_dir
to_zip_dir(source, destination=None, fmt='zip')
Zip a folder. Ref: https://stackoverflow.com/a/50381250/3521099
function to_zip
to_zip(
p: str,
outp: str = None,
func_rename=None,
fmt: str = 'zip',
test: bool = False
)
Compress a file/directory.
Parameters:
p
(str): path to the file/directory.outp
(str): path to the output compressed file.fmt
(str): format of the compressed file.
Returns:
outp
(str): path of the compressed file.
function to_dir
to_dir(
paths: dict,
output_dir_path: str,
rename_basename=None,
force=False,
test=False
)
function get_version
get_version(suffix: str = '') → str
Get the time-based version string.
Parameters:
suffix
(string): suffix.
Returns:
version
(string): version.
function to_version
to_version(
p: str,
outd: str = None,
test: bool = False,
label: str = '',
**kws: dict
) → str
Rename a file/directory to a version.
Parameters:
p
(str): path.outd
(str): output directory.
Keyword parameters:
kws
(dict): provided toget_version
.
Returns:
version
(string): version.
TODOs: 1. Use to_dir
.
function backup
backup(
p: str,
outd: str = None,
versioned: bool = False,
suffix: str = '',
zipped: bool = False,
move_only: bool = False,
test: bool = True,
verbose: bool = False,
no_test: bool = False
)
Backup a directory
Steps: 0. create version dir in outd 1. move ps to version (time) dir with common parents till the level of the version dir 2. zip or not
Parameters:
p
(str): input path.outd
(str): output directory path.versioned
(bool): custom version for the backup (False).suffix
(str): custom suffix for the backup ('').zipped
(bool): whether to zip the backup (False).test
(bool): testing (True).no_test
(bool): no testing. Usage in command line (False).
TODOs: 1. Use to_dir
. 2. Option to remove dirs find and move/zip "find -regex ./_." "find -regex ./test."
function read_url
read_url(url)
Read text from an URL.
Parameters:
url
(str): URL link.
Returns:
s
(string): text content of the URL.
function download
download(
url: str,
path: str = None,
outd: str = None,
force: bool = False,
verbose: bool = True
) → str
Download a file.
Parameters:
url
(str): URL.path
(str): custom output path (None)outd
(str): output directory ('data/database').force
(bool): overwrite output (False).verbose
(bool): verbose (True).
Returns:
path
(str): output path (None)
function read_text
read_text(p)
Read a file. To be called by other functions
Args:
p
(str): path.
Returns:
s
(str): contents.
function to_list
to_list(l1, p)
Save list.
Parameters:
l1
(list): input list.p
(str): path.
Returns:
p
(str): path.
function read_list
read_list(p)
Read the lines in the file.
Args:
p
(str): path.
Returns:
l
(list): list.
function read_list
read_list(p)
Read the lines in the file.
Args:
p
(str): path.
Returns:
l
(list): list.
function is_dict
is_dict(p)
function read_dict
read_dict(p, fmt: str = '', apply_on_keys=None, **kws) → dict
Read dictionary file.
Parameters:
p
(str): path.fmt
(str): format of the file.
Keyword Arguments:
kws
(d): parameters provided to reader function.
Returns:
d
(dict): output dictionary.
function to_dict
to_dict(d, p, **kws)
Save dictionary file.
Parameters:
d
(dict): input dictionary.p
(str): path.
Keyword Arguments:
kws
(d): parameters provided to export function.
Returns:
p
(str): path.
function post_read_table
post_read_table(
df1: DataFrame,
clean: bool,
tables: list,
verbose: bool = True,
**kws_clean: dict
)
Post-reading a table.
Parameters:
df1
(DataFrame): input dataframe.clean
(bool): whether to applyclean
function. tables ()verbose
(bool): verbose.
Keyword parameters:
kws_clean
(dict): paramters provided to theclean
function.
Returns:
df
(DataFrame): output dataframe.
function read_table
read_table(
p: str,
ext: str = None,
clean: bool = True,
filterby_time=None,
params: dict = {},
kws_clean: dict = {},
kws_cloud: dict = {},
check_paths: bool = True,
use_paths: bool = False,
tables: int = 1,
test: bool = False,
verbose: bool = True,
engine: str = 'pyarrow',
**kws_read_tables: dict
)
Table/s reader.
Parameters:
- <b>`p`</b> (str): path of the file. It could be an input for `read_ps`, which would include strings with wildcards, list etc.
- <b>`ext`</b> (str): extension of the file (default: None meaning infered from the path).
- <b>`clean=(default`</b>: True). filterby_time=None).
- <b>`check_paths`</b> (bool): read files in the path column (default:True).
- <b>`use_paths`</b> (bool): forced read files in the path column (default:False).
- <b>`test`</b> (bool): testing (default:False).
- <b>`params`</b>: parameters provided to the 'pd.read_csv' (default:{}). For example
- <b>`params['columns']`</b>: columns to read.
- <b>`kws_clean`</b>: parameters provided to 'rd.clean' (default:{}).
- <b>`kws_cloud`</b>: parameters for reading files from google-drive (default:{}).
- <b>`tables`</b>: how many tables to be read (default:1).
- <b>`verbose`</b>: verbose (default:True).
Keyword parameters:
- kws_read_tables
(dict): parameters provided to read_tables
function. For example:
- to_col={colindex
: replaces_index}
Returns:
- <b>`df`</b> (DataFrame): output dataframe.
Examples:
-
For reading specific columns only set
params=dict(columns=list)
. -
For reading many files, convert paths to a column with corresponding values:
to_col={colindex: replaces_index}
- Reading a vcf file. p='*.vcf|vcf.gz' read_table(p, params_read_csv=dict( #compression='gzip', sep=' ',comment='#',header=None, names=replace_many(get_header(path,comment='#',lineno=-1),['#',' '],'').split(' ')) )
function get_logp
get_logp(ps: list) → str
Infer the path of the log file.
Parameters:
ps
(list): list of paths.
Returns:
p
(str): path of the output file.
function apply_on_paths
apply_on_paths(
ps: list,
func,
replaces_outp: str = None,
to_col: dict = None,
replaces_index=None,
drop_index: bool = True,
colindex: str = 'path',
filter_rows: dict = None,
fast: bool = False,
progress_bar: bool = True,
params: dict = {},
dbug: bool = False,
test1: bool = False,
verbose: bool = True,
kws_read_table: dict = {},
**kws: dict
)
Apply a function on list of files.
Parameters:
ps
(str|list): paths or string to infer paths usingread_ps
.to_col
(dict): convert the paths to a column e.g. {colindex: replaces_index}func
(function): function to be applied on each of the paths.replaces_outp
(dict|function): infer the output path (outp
) by replacing substrings in the input paths (p
).filter_rows
(dict): filter the rows based on dict, usingrd.filter_rows
.fast
(bool): parallel processing (default:False).progress_bar
(bool): show progress bar(default:True).params
(dict): parameters provided to thepd.read_csv
function.dbug
(bool): debug mode on (default:False).test1
(bool): test on one path (default:False).kws_read_table
(dict): parameters provided to theread_table
function (default:{}).replaces_index
(object|dict|list|str): for example, 'basenamenoext' if path to basename.drop_index
(bool): whether to drop the index column e.g.path
(default: True).colindex
(str): the name of the column containing the paths (default: 'path')
Keyword parameters:
kws
(dict): parameters provided to the function.
Example:
- Function: def apply_(p,outd='data/data_analysed',force=False): outp=f"{outd}/{basenamenoext(p)}.pqt' if exists(outp) and not force: return df01=read_table(p) apply_on_paths( ps=glob("data/data_analysed/*"), func=apply_, outd="data/data_analysed/", force=True, fast=False, read_path=True, )
TODOs: Move out of io.
function read_tables
read_tables(
ps: list,
fast: bool = False,
filterby_time=None,
to_dict: bool = False,
params: dict = {},
tables: int = None,
**kws_apply_on_paths: dict
)
Read multiple tables.
Parameters:
ps
(list): list of paths.fast
(bool): parallel processing (default:False)filterby_time
(str): filter by time (default:None)drop_index
(bool): drop index (default:True)to_dict
(bool): output dictionary (default:False)params
(dict): parameters provided to thepd.read_csv
function (default:{})tables
: number of tables (default:None).
Keyword parameters:
kws_apply_on_paths
(dict): parameters provided toapply_on_paths
.
Returns:
df
(DataFrame): output dataframe.
TODOs: Parameter to report the creation dates of the newest and the oldest files.
function to_table
to_table(
df: DataFrame,
p: str,
colgroupby: str = None,
test: bool = False,
**kws
)
Save table.
Parameters:
df
(DataFrame): the input dataframe.p
(str): output path.colgroupby
(str|list): columns to groupby with to save the subsets of the data as separate files.test
(bool): testing on (default:False).
Keyword parameters:
kws
(dict): parameters provided to theto_manytables
function.
Returns:
p
(str): path of the output.
function to_manytables
to_manytables(
df: DataFrame,
p: str,
colgroupby: str,
fmt: str = '',
ignore: bool = False,
kws_get_chunks={},
**kws_to_table
)
Save many table.
Parameters:
df
(DataFrame): the input dataframe.p
(str): output path.colgroupby
(str|list): columns to groupby with to save the subsets of the data as separate files.fmt
(str): if '=' column names in the folder name e.g. col1=True.ignore
(bool): ignore the warnings (default:False).
Keyword parameters:
kws_get_chunks
(dict): parameters provided to theget_chunks
function.
Returns:
p
(str): path of the output.
TODOs:
1. Change in default parameter
:fmt='='
.
function to_table_pqt
to_table_pqt(
df: DataFrame,
p: str,
engine: str = 'pyarrow',
compression: str = 'gzip',
**kws_pqt: dict
) → str
Save a parquet file.
Parameters:
df
(pd.DataFrame): table.p
(str): path.
Keyword parameters: Parameters provided to pd.DataFrame.to_parquet
.
Returns:
function tsv2pqt
tsv2pqt(p: str) → str
Convert tab-separated file to Apache parquet.
Parameters:
p
(str): path of the input.
Returns:
p
(str): path of the output.
function pqt2tsv
pqt2tsv(p: str) → str
Convert Apache parquet file to tab-separated.
Parameters:
p
(str): path of the input.
Returns:
p
(str): path of the output.
function read_excel
read_excel(
p: str,
sheet_name: str = None,
kws_cloud: dict = {},
test: bool = False,
**kws
)
Read excel file
Parameters:
p
(str): path of the file.sheet_name
(str|None): read 1st sheet if None (default:None)kws_cloud
(dict): parameters provided to read the file from the google drive (default:{})test
(bool): if False and sheet_name not provided, return all sheets as a dictionary, else if True, print list of sheets.
Keyword parameters:
kws
: parameters provided to the excel reader.
function to_excel_commented
to_excel_commented(p: str, comments: dict, outp: str = None, author: str = None)
Add comments to the columns of excel file and save.
Args:
p
(str): input path of excel file.comments
(dict): map between column names and comment e.g. description of the column.outp
(str): output path of excel file. Defaults to None.author
(str): author of the comments. Defaults to 'Author'.
TODOs: 1. Increase the limit on comments can be added to number of columns. Currently it is 26 i.e. upto Z1.
function to_excel
to_excel(
sheetname2df: dict,
outp: str,
comments: dict = None,
save_input: bool = False,
author: str = None,
append: bool = False,
adjust_column_width: bool = True,
**kws
)
Save excel file.
Parameters:
sheetname2df
(dict): dictionary mapping the sheetname to the dataframe.outp
(str): output path.append
(bool): append the dataframes (default:False).comments
(dict): map between column names and comment e.g. description of the column.save_input
(bool): additionally save the input tables in text format.
Keyword parameters:
kws
: parameters provided to the excel writer.
function check_chunks
check_chunks(outd, col, plot=True)
Create chunks of the tables.
Parameters:
outd
(str): output directory.col
(str): the column with values that are used for getting the chunks.plot
(bool): plot the chunk sizes (default:True).
Returns:
df3
(DataFrame): output dataframe.
module roux.lib
Global Variables
- set
- str
- sys
- df
- dfs
- text
- io
function to_class
to_class(cls)
Get the decorator to attach functions.
Parameters:
cls
(class): class object.
Returns:
decorator
(decorator): decorator object.
References:
https
: //gist.github.com/mgarod/09aa9c3d8a52a980bd4d738e52e5b97a
function decorator
decorator(func)
function decorator
decorator(func)
class rd
roux-dataframe
(.rd
) extension.
method __init__
__init__(pandas_obj)
class rs
roux-series
(.rs
) extension.
method __init__
__init__(pandas_obj)
module roux.viz.figure
For setting up figures.
function get_children
get_children(fig)
Get all the individual objects included in the figure.
function get_child_text
get_child_text(search_name, all_children=None, fig=None)
Get text object.
function align_texts
align_texts(fig, texts: list, align: str, test=False)
Align text objects.
function labelplots
labelplots(
axes: list = None,
fig=None,
labels: list = None,
xoff: float = 0,
yoff: float = 0,
auto: bool = False,
xoffs: dict = {},
yoffs: dict = {},
va: str = 'center',
ha: str = 'left',
verbose: bool = True,
test: bool = False,
**kws_text
)
Label (sub)plots.
Args:
fig
:plt.figure
object.axes
(type): list ofplt.Axes
objects.xoff
(int, optional): x offset. Defaults to 0.yoff
(int, optional): y offset. Defaults to 0.params_alignment
(dict, optional): alignment parameters. Defaults to {}.params_text
(dict, optional): parameters provided toplt.text
. Defaults to {'size':20,'va':'bottom', 'ha':'right' }.test
(bool, optional): test mode. Defaults to False.
Todos: 1. Get the x coordinate of the ylabel.
function annot_axs
annot_axs(data, ax1, ax2, cols, **kws_line)
module roux.workflow.function
For function management.
function get_quoted_path
get_quoted_path(s1: str) → str
Quoted paths.
Args:
s1
(str): path.
Returns:
str
: quoted path.
function get_path
get_path(
s: str,
validate: bool,
prefixes=['data/', 'metadata/', 'plot/'],
test=False
) → str
Extract pathsfrom a line of code.
Args:
s
(str): line of code.validate
(bool): validate the output.prefixes
(list, optional): allowed prefixes. Defaults to ['data/','metadata/','plot/'].test
(bool, optional): test mode. Defaults to False.
Returns:
str
: path.
TODOs: 1. Use wildcards i.e. *'s.
function remove_dirs_from_outputs
remove_dirs_from_outputs(outputs: list, test: bool = False) → list
Remove directories from the output paths.
Args:
outputs
(list): output paths.test
(bool, optional): test mode. Defaults to False.
Returns:
list
: paths.
function get_ios
get_ios(l: list, test=False) → tuple
Get input and output (IO) paths.
Args:
l
(list): list of lines of code.test
(bool, optional): test mode. Defaults to False.
Returns:
tuple
: paths of inputs and outputs.
function get_name
get_name(s: str, i: int, sep_step: str = '## step') → str
Get name of the function.
Args:
s
(str): lines in markdown format.sep_step
(str, optional): separator marking the start of a step. Defaults to "## step".i
(int): index of the step.
Returns:
str
: name of the function.
function get_step
get_step(
l: list,
name: str,
sep_step: str = '## step',
sep_step_end: str = '## tests',
test=False,
tab=' '
) → dict
Get code for a step.
Args:
l
(list): list of lines of codename
(str): name of the function.test
(bool, optional): test mode. Defaults to False.tab
(str, optional): tab format. Defaults to ' '.
Returns:
dict
: step name to code map.
function to_task
to_task(
notebookp,
task=None,
sep_step: str = '## step',
sep_step_end: str = '## tests',
notebook_suffix: str = '_v',
force=False,
validate=False,
path_prefix=None,
verbose=True,
test=False
) → str
Get the lines of code for a task (script to be saved as an individual .py
file).
Args:
notebookp
(type): path of the notebook.sep_step
(str, optional): separator marking the start of a step. Defaults to "## step".sep_step_end
(str, optional): separator marking the end of a step. Defaults to "## tests".notebook_suffix
(str, optional): suffix of the notebook file to be considered as a "task".force
(bool, optional): overwrite output. Defaults to False.validate
(bool, optional): validate output. Defaults to False.path_prefix
(type, optional): prefix to the path. Defaults to None.verbose
(bool, optional): show verbose. Defaults to True.test
(bool, optional): test mode. Defaults to False.
Returns:
str
: lines of the code.
function get_global_imports
get_global_imports() → DataFrame
Get the metadata of the functions imported from from roux import global_imports
.
module roux.stat.fit
For fitting data.
function fit_curve_fit
fit_curve_fit(
func,
xdata: <built-in function array> = None,
ydata: <built-in function array> = None,
bounds: tuple = (-inf, inf),
test=False,
plot=False
) → tuple
Wrapper around scipy
's curve_fit
.
Args:
func
(function): fitting function.xdata
(np.array, optional): x data. Defaults to None.ydata
(np.array, optional): y data. Defaults to None.bounds
(tuple, optional): bounds. Defaults to (-np.inf, np.inf).test
(bool, optional): test. Defaults to False.plot
(bool, optional): plot. Defaults to False.
Returns:
tuple
: output.
function fit_gauss_bimodal
fit_gauss_bimodal(
data: <built-in function array>,
bins: int = 50,
expected: tuple = (1, 0.2, 250, 2, 0.2, 125),
test=False
) → tuple
Fit bimodal gaussian distribution to the data in vector format.
Args:
data
(np.array): vector.bins
(int, optional): bins. Defaults to 50.expected
(tuple, optional): expected parameters. Defaults to (1,.2,250,2,.2,125).test
(bool, optional): test. Defaults to False.
Returns:
tuple
: description
Notes:
Observed better performance with
roux.stat.cluster.cluster_1d
.
function get_grid
get_grid(
x: <built-in function array>,
y: <built-in function array>,
z: <built-in function array> = None,
off: int = 0,
grids: int = 100,
method='linear',
test=False,
**kws
) → tuple
2D grids from 1d data.
Args:
x
(np.array): vector.y
(np.array): vector.z
(np.array, optional): vector. Defaults to None.off
(int, optional): offsets. Defaults to 0.grids
(int, optional): grids. Defaults to 100.method
(str, optional): method. Defaults to 'linear'.test
(bool, optional): test. Defaults to False.
Returns:
tuple
: output.
function fit_gaussian2d
fit_gaussian2d(
x: <built-in function array>,
y: <built-in function array>,
z: <built-in function array>,
grid=True,
grids=20,
method='linear',
off=0,
rescalez=True,
test=False
) → tuple
Fit gaussian 2D.
Args:
x
(np.array): vector.y
(np.array): vector.z
(np.array): vector.grid
(bool, optional): grid. Defaults to True.grids
(int, optional): grids. Defaults to 20.method
(str, optional): method. Defaults to 'linear'.off
(int, optional): offsets. Defaults to 0.rescalez
(bool, optional): rescalez. Defaults to True.test
(bool, optional): test. Defaults to False.
Returns:
tuple
: output.
function fit_2d_distribution_kde
fit_2d_distribution_kde(
x: <built-in function array>,
y: <built-in function array>,
bandwidth: float,
xmin: float = None,
xmax: float = None,
xbins=100j,
ymin: float = None,
ymax: float = None,
ybins=100j,
test=False,
**kwargs
) → tuple
2D kernel density estimate (KDE).
Notes:
Cut off outliers: quantile_coff=0.01 params_grid=merge_dicts([ df01.loc[:,var2col.values()].quantile(quantile_coff).rename(index=flip_dict({f"{k}min":var2col[k] for k in var2col})).to_dict(), df01.loc[:,var2col.values()].quantile(1-quantile_coff).rename(index=flip_dict({f"{k}max":var2col[k] for k in var2col})).to_dict(), ])
Args:
x
(np.array): vector.y
(np.array): vector.bandwidth
(float): bandwidthxmin
(float, optional): x minimum. Defaults to None.xmax
(float, optional): x maximum. Defaults to None.xbins
(type, optional): x bins. Defaults to 100j.ymin
(float, optional): y minimum. Defaults to None.ymax
(float, optional): y maximum. Defaults to None.ybins
(type, optional): y bins. Defaults to 100j.test
(bool, optional): test. Defaults to False.
Returns:
tuple
: output.
function check_poly_fit
check_poly_fit(d: DataFrame, xcol: str, ycol: str, degmax: int = 5) → DataFrame
Check the fit of a polynomial equations.
Args:
d
(pd.DataFrame): input dataframe.xcol
(str): column containing the x values.ycol
(str): column containing the y values.degmax
(int, optional): degree maximum. Defaults to 5.
Returns:
pd.DataFrame
: description
function mlr_2
mlr_2(df: DataFrame, coly: str, colxs: list) → tuple
Multiple linear regression between two variables.
Args:
df
(pd.DataFrame): input dataframe.coly
(str): column containing y values.colxs
(list): columns containing x values.
Returns:
tuple
: output.
function get_mlr_2_str
get_mlr_2_str(df: DataFrame, coly: str, colxs: list) → str
Get the result of the multiple linear regression between two variables as a string.
Args:
df
(pd.DataFrame): input dataframe.coly
(str): column containing y values.colxs
(list): columns containing x values.
Returns:
str
: output.
module roux.stat.sets
For set related stats.
function get_overlap
get_overlap(
items_set: list,
items_test: list,
output_format: str = 'list'
) → list
Get overlapping items as a string.
Args:
items_set
(list): items in the reference setitems_test
(list): items to testoutput_format
(str, optional): format of the output. Defaults to 'list'.
Raises:
ValueError
: output_format can be list or str
function get_overlap_size
get_overlap_size(
items_set: list,
items_test: list,
fraction: bool = False,
perc: bool = False,
by: str = None
) → float
Percentage Jaccard index.
Args:
items_set
(list): items in the reference setitems_test
(list): items to testfraction
(bool, optional): output fraction. Defaults to False.perc
(bool, optional): output percentage. Defaults to False.by
(str, optional): fraction by. Defaults to None.
Returns:
float
: overlap size.
function get_item_set_size_by_background
get_item_set_size_by_background(items_set: list, background: int) → float
Item set size by background
Args:
items_set
(list): items in the reference setbackground
(int): background size
Returns:
float
: Item set size by background
Notes:
Denominator of the fold change.
function get_fold_change
get_fold_change(items_set: list, items_test: list, background: int) → float
Get fold change.
Args:
items_set
(list): items in the reference setitems_test
(list): items to testbackground
(int): background size
Returns:
float
: fold change
Notes:
fc = (intersection/(test items))/((items in the item set)/background)
function get_hypergeom_pval
get_hypergeom_pval(items_set: list, items_test: list, background: int) → float
Calculate hypergeometric P-value.
Args:
items_set
(list): items in the reference setitems_test
(list): items to testbackground
(int): background size
Returns:
float
: hypergeometric P-value
function get_contigency_table
get_contigency_table(items_set: list, items_test: list, background: int) → list
Get a contingency table required for the Fisher's test.
Args:
items_set
(list): items in the reference setitems_test
(list): items to testbackground
(int): background size
Returns:
list
: contingency table
Notes:
within item (/referenece) set: True False within test item: True intersection True False False False False total-size of union
function get_odds_ratio
get_odds_ratio(items_set: list, items_test: list, background: int) → float
Calculate Odds ratio and P-values using Fisher's exact test.
Args:
items_set
(list): items in the reference setitems_test
(list): items to testbackground
(int): background size
Returns:
float
: Odds ratio
function get_enrichment
get_enrichment(
df1: DataFrame,
df2: DataFrame,
colid: str,
colset: str,
background: int,
coltest: str = None,
test_type: list = None,
verbose: bool = False
) → DataFrame
Calculate the enrichments.
Args:
df1
(pd.DataFrame): table containing items to testdf2
(pd.DataFrame): table containing refence sets and itemscolid
(str): column with IDs of itemscolset
(str): column setscoltest
(str): column testsbackground
(int): background size.test_type
(list): hypergeom or Fisher. Defaults to both.verbose
(bool): verbose
Returns:
pd.DataFrame
: output table
module roux.viz.ds
For wrappers around pandas Series plotting attributes.
function hist
hist(ds: Series, ax: Axes = None, kws_set_label_n={}, **kws)
module roux.viz.blends
Blends of plotting functions.
function plot_ranks
plot_ranks(
data: DataFrame,
kws_plot: dict,
col: str,
colid: str,
col_label: str = None,
xlim_min: float = -20,
ax=None
)
module roux.viz.colors
For setting up colors.
function rgbfloat2int
rgbfloat2int(rgb_float)
function get_colors_default
get_colors_default() → list
get default colors.
Returns:
list
: colors.
function get_ncolors
get_ncolors(
n: int,
cmap: str = 'Spectral',
ceil: bool = False,
test: bool = False,
N: int = 20,
out: str = 'hex',
**kws_get_cmap_section
) → list
Get colors.
Args:
n
(int): number of colors to get.cmap
(str, optional): colormap. Defaults to 'Spectral'.ceil
(bool, optional): ceil. Defaults to False.test
(bool, optional): test mode. Defaults to False.N
(int, optional): number of colors in the colormap. Defaults to 20.out
(str, optional): output. Defaults to 'hex'.
Returns:
list
: colors.
function get_val2color
get_val2color(
ds: Series,
vmin: float = None,
vmax: float = None,
cmap: str = 'Reds'
) → dict
Get color for a value.
Args:
ds
(pd.Series): values.vmin
(float, optional): minimum value. Defaults to None.vmax
(float, optional): maximum value. Defaults to None.cmap
(str, optional): colormap. Defaults to 'Reds'.
Returns:
dict
: output.
function saturate_color
saturate_color(color, alpha: float) → object
Saturate a color.
Args: color (type):
alpha
(float): alpha level.
Returns:
object
: output.
References:
https
: //stackoverflow.com/a/60562502/3521099
function mix_colors
mix_colors(d: dict) → str
Mix colors.
Args:
d
(dict): colors to alpha map.
Returns:
str
: hex color.
References:
https
: //stackoverflow.com/a/61488997/3521099
function make_cmap
make_cmap(cs: list, N: int = 20, **kws)
Create a colormap.
Args:
cs
(list): colorsN
(int, optional): resolution i.e. number of colors. Defaults to 20.
Returns: cmap.
function get_cmap_section
get_cmap_section(
cmap,
vmin: float = 0.0,
vmax: float = 1.0,
n: int = 100
) → object
Get section of a colormap.
Args:
cmap
(object| str): colormap.vmin
(float, optional): minimum value. Defaults to 0.0.vmax
(float, optional): maximum value. Defaults to 1.0.n
(int, optional): resolution i.e. number of colors. Defaults to 100.
Returns:
object
: cmap.
function append_cmap
append_cmap(
cmap: str = 'Reds',
color: str = '#D3DDDC',
cmap_min: float = 0.2,
cmap_max: float = 0.8,
ncolors: int = 100,
ncolors_min: int = 1,
ncolors_max: int = 0
)
Append a color to colormap.
Args:
cmap
(str, optional): colormap. Defaults to 'Reds'.color
(str, optional): color. Defaults to '#D3DDDC'.cmap_min
(float, optional): cmap_min. Defaults to 0.2.cmap_max
(float, optional): cmap_max. Defaults to 0.8.ncolors
(int, optional): number of colors. Defaults to 100.ncolors_min
(int, optional): number of colors minimum. Defaults to 1.ncolors_max
(int, optional): number of colors maximum. Defaults to 0.
Returns: cmap.
References:
https
: //matplotlib.org/stable/tutorials/colors/colormap-manipulation.html
module roux.viz.diagram
For diagrams e.g. flowcharts
function diagram_nb
diagram_nb(
graph: str,
counts: dict = None,
out: bool = False,
test: bool = False
)
Show a diagram in jupyter notebook using mermaid.js.
Parameters:
graph
(str): markdown-formatted graph. Please see https://mermaid.js.org/intro/n00b-syntaxReference.htmlout
(bool): Output the URL. Defaults to False.
References:
1. https
: //mermaid.js.org/config/Tutorials.html#jupyter-integration-with-mermaid-js
Examples:
graph LR; i1(["input1"]) & d1[("data1")] --> p1[["process1"]] --> o1(["output1"]) p1 --> o2["output2"]:::ends classDef ends fill:#fff,stroke:#fff
module roux.workflow
Global Variables
- io
- log
- task
- nb
module roux.global_imports
For importing commonly used functions at the development phase.
Requirements:
pip install roux[all]
Usage: in interactive sessions (e.g. in jupyter notebooks) to facilitate faster code development.
Note: Post-development, to remove *s from the code, use removestar (pip install removestar).
removestar file
module roux.viz.annot
For annotations.
function annot_side
annot_side(
ax: Axes,
df1: DataFrame,
colx: str,
coly: str,
cols: str = None,
hue: str = None,
loc: str = 'right',
scatter=False,
scatter_marker='|',
scatter_alpha=0.75,
lines=True,
offx3: float = 0.15,
offymin: float = 0.1,
offymax: float = 0.9,
length_axhline: float = 3,
text=True,
text_offx: float = 0,
text_offy: float = 0,
invert_xaxis: bool = False,
break_pt: int = 25,
va: str = 'bottom',
zorder: int = 2,
color: str = 'gray',
kws_line: dict = {},
kws_scatter: dict = {},
**kws_text
) → Axes
Annot elements of the plots on the of the side plot.
Args:
df1
(pd.DataFrame): input datacolx
(str): column with x values.coly
(str): column with y values.cols
(str): column with labels.hue
(str): column with colors of the labels.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.loc
(str, optional): location. Defaults to 'right'.invert_xaxis
(bool, optional): invert xaxis. Defaults to False.offx3
(float, optional): x-offset for bend position of the arrow. Defaults to 0.15.offymin
(float, optional): x-offset minimum. Defaults to 0.1.offymax
(float, optional): x-offset maximum. Defaults to 0.9.break_pt
(int, optional): break point of the labels. Defaults to 25.length_axhline
(float, optional): length of the horizontal line i.e. the "underline". Defaults to 3.zorder
(int, optional): z-order. Defaults to 1.color
(str, optional): color of the line. Defaults to 'gray'.kws_line
(dict, optional): parameters for formatting the line. Defaults to {}.
Keyword Args:
kws
: parameters provided to theax.text
function.
Returns:
plt.Axes
:plt.Axes
object.
function annot_side_curved
annot_side_curved(
data,
colx: str,
coly: str,
col_label: str,
off: float = 0.5,
lim: tuple = None,
limf: tuple = None,
loc: str = 'right',
ax=None,
test: bool = False,
kws_text={},
**kws_line
)
Annot elements of the plots on the of the side plot using bezier lines.
Usage: 1. Allows m:1 mappings between points and labels
function show_outlines
show_outlines(
data: DataFrame,
colx: str,
coly: str,
column_outlines: str,
outline_colors: dict,
style=None,
legend: bool = True,
kws_legend: dict = {},
zorder: int = 3,
ax: Axes = None,
**kws_scatter
) → Axes
Outline points on the scatter plot by categories.
function show_confidence_ellipse
show_confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs)
Create a plot of the covariance confidence ellipse of x and y.
Parameters:
---------- x, y : array-like, shape (n, ) Input data.
ax : matplotlib.axes.Axes The axes object to draw the ellipse into.
n_std : float The number of standard deviations to determine the ellipse's radiuses.
**kwargs Forwarded to ~matplotlib.patches.Ellipse
Returns ------- matplotlib.patches.Ellipse
References ---------- https://matplotlib.org/3.5.0/gallery/statistics/confidence_ellipse.html
function show_box
show_box(
ax: Axes,
xy: tuple,
width: float,
height: float,
fill: str = None,
alpha: float = 1,
lw: float = 1.1,
edgecolor: str = 'k',
clip_on: bool = False,
scale_width: float = 1,
scale_height: float = 1,
xoff: float = 0,
yoff: float = 0,
**kws
) → Axes
Highlight sections of a plot e.g. heatmap by drawing boxes.
Args:
xy
(tuple): position of left, bottom corner of the box.width
(float): width.height
(float): height.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.fill
(str, optional): fill the box with color. Defaults to None.alpha
(float, optional): alpha of color. Defaults to 1.lw
(float, optional): line width. Defaults to 1.1.edgecolor
(str, optional): edge color. Defaults to 'k'.clip_on
(bool, optional): clip the boxes by the axis limit. Defaults to False.scale_width
(float, optional): scale width. Defaults to 1.scale_height
(float, optional): scale height. Defaults to 1.xoff
(float, optional): x-offset. Defaults to 0.yoff
(float, optional): y-offset. Defaults to 0.
Keyword Args:
kws
: parameters provided to theRectangle
function.
Returns:
plt.Axes
:plt.Axes
object.
function color_ax
color_ax(ax: Axes, c: str, linewidth: float = None) → Axes
Color border of plt.Axes
.
Args:
ax
(plt.Axes):plt.Axes
object.c
(str): color.linewidth
(float, optional): line width. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
function show_n_legend
show_n_legend(ax, df1: DataFrame, colid: str, colgroup: str, **kws)
function show_scatter_stats
show_scatter_stats(
ax: Axes,
data: DataFrame,
x,
y,
z,
method: str,
resample: bool = False,
show_n: bool = True,
show_n_prefix: str = '',
prefix: str = '',
loc=None,
zorder: int = 5,
verbose: bool = True,
kws_stat={},
**kws_set_label
)
resample (bool, optional): resample data. Defaults to False.
function show_crosstab_stats
show_crosstab_stats(
data: DataFrame,
cols: list,
method: str = None,
alpha: float = 0.05,
loc: str = None,
xoff: float = 0,
yoff: float = 0,
linebreak: bool = False,
ax: Axes = None,
**kws_set_label
) → Axes
Annotate a confusion matrix.
Args:
data
(pd.DataFrame): input data.cols
(list): list of columns with the categories.method
(str, optional): method used to calculate the statistical significance.alpha
(float, optional): alpha for the stats. Defaults to 0.05.loc
(str, optional): location. Over-rides kws_set_label. Defaults to None.xoff
(float, optional): x offset. Defaults to 0.yoff
(float, optional): y offset. Defaults to 0.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws_set_label
: keyword parameters provided toset_label
.
Returns:
plt.Axes
:plt.Axes
object.
function show_confusion_matrix_stats
show_confusion_matrix_stats(
df_: DataFrame,
ax: Axes = None,
off: float = 0.5
) → Axes
Annotate a confusion matrix.
Args:
df_
(pd.DataFrame): input data.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.off
(float, optional): offset. Defaults to 0.5.
Returns:
plt.Axes
:plt.Axes
object.
function set_suptitle
set_suptitle(axs, title, offy=0, **kws_text)
Combined title for a list of subplots.
module roux.vizi
module roux.lib.set
For processing list-like sets.
function union
union(l)
Union of lists.
Parameters:
l
(list): list of lists.
Returns:
l
(list): list.
function union
union(l)
Union of lists.
Parameters:
l
(list): list of lists.
Returns:
l
(list): list.
function intersection
intersection(l)
Intersections of lists.
Parameters:
l
(list): list of lists.
Returns:
l
(list): list.
function intersection
intersection(l)
Intersections of lists.
Parameters:
l
(list): list of lists.
Returns:
l
(list): list.
function nunion
nunion(l)
Count the items in union.
Parameters:
l
(list): list of lists.
Returns:
i
(int): count.
function nintersection
nintersection(l)
Count the items in intersetion.
Parameters:
l
(list): list of lists.
Returns:
i
(int): count.
function check_non_overlaps_with
check_non_overlaps_with(l1: list, l2: list, out_count: bool = False, log=True)
function validate_overlaps_with
validate_overlaps_with(l1, l2, **kws_check)
function assert_overlaps_with
assert_overlaps_with(l1, l2, out_count=False)
function jaccard_index
jaccard_index(l1, l2)
function dropna
dropna(x)
Drop np.nan
items from a list.
Parameters:
x
(list): list.
Returns:
x
(list): list.
function unique
unique(l)
Unique items in a list.
Parameters:
l
(list): input list.
Returns:
l
(list): list.
Notes:
The function can return list of lists if used in
pandas.core.groupby.DataFrameGroupBy.agg
context.
function unique_sorted
unique_sorted(l)
Unique items in a list.
Parameters:
l
(list): input list.
Returns:
l
(list): list.
Notes:
The function can return list of lists if used in
pandas.core.groupby.DataFrameGroupBy.agg
context.
function list2str
list2str(x, fmt=None, ignore=False)
Returns string if single item in a list.
Parameters:
x
(list): list
Returns:
s
(str): string.
function lists2str
lists2str(ds: DataFrame, **kws_list2str) → str
Combining lists with ids to to unified string
Usage: pandas
aggregation functions.
function unique_str
unique_str(l, **kws)
Unique single item from a list.
Parameters:
l
(list): input list.
Returns:
l
(list): list.
function nunique
nunique(l, **kws)
Count unique items in a list
Parameters:
l
(list): list
Returns:
i
(int): count.
function flatten
flatten(l)
List of lists to list.
Parameters:
l
(list): input list.
Returns:
l
(list): output list.
function get_alt
get_alt(l1, s)
Get alternate item between two.
Parameters:
l1
(list): list.s
(str): item.
Returns:
s
(str): alternate item.
function intersections
intersections(dn2list, jaccard=False, count=True, fast=False, test=False)
Get intersections between lists.
Parameters:
dn2list
(dist): dictionary mapping to lists.jaccard
(bool): return jaccard indices.count
(bool): return counts.fast
(bool): fast.test
(bool): verbose.
Returns:
df
(DataFrame): output dataframe.
TODOs: 1. feed as an estimator to df.corr()
. 2. faster processing by filling up the symetric half of the adjacency matrix.
function range_overlap
range_overlap(l1, l2)
Overlap between ranges.
Parameters:
l1
(list): start and end integers of one range.l2
(list): start and end integers of other range.
Returns:
l
(list): overlapped range.
function get_windows
get_windows(
a,
size=None,
overlap=None,
windows=None,
overlap_fraction=None,
stretch_last=False,
out_ranges=True
)
Windows/segments from a range.
Parameters:
a
(list): range.size
(int): size of the windows.windows
(int): number of windows.overlap_fraction
(float): overlap fraction.overlap
(int): overlap length.stretch_last
(bool): stretch last window.out_ranges
(bool): whether to output ranges.
Returns:
df1
(DataFrame): output dataframe.
Notes:
- For development, use of
int
providesnp.floor
.
function bools2intervals
bools2intervals(v)
Convert bools to intervals.
Parameters:
v
(list): list of bools.
Returns:
l
(list): intervals.
function list2ranges
list2ranges(l)
function get_pairs
get_pairs(
items: list,
items_with: list = None,
size: int = 2,
with_self: bool = False,
unique: bool = False
) → DataFrame
Creates a dataframe with the paired items.
Parameters:
items
: the list of items to pair.items_with
: list of items to pair with.size
: size of the combinations.with_self
: pair with self or not.unique
(bool): get unique pairs (defaults to False).
Returns: table with pairs of items.
Notes:
- the ids of the items are sorted e.g. 'a'-'b' not 'b'-'a'. 2. itertools.combinations does not pair self.
module roux.stat.solve
For solving equations.
function get_intersection_locations
get_intersection_locations(
y1: <built-in function array>,
y2: <built-in function array>,
test: bool = False,
x: <built-in function array> = None
) → list
Get co-ordinates of the intersection (x[idx]).
Args:
y1
(np.array): vector.y2
(np.array): vector.test
(bool, optional): test mode. Defaults to False.x
(np.array, optional): vector. Defaults to None.
Returns:
list
: output.
module roux.stat.preprocess
For classification.
function dropna_matrix
dropna_matrix(
df1,
coff_cols_min_perc_na=5,
coff_rows_min_perc_na=5,
test=False,
verbose=False
)
function drop_low_complexity
drop_low_complexity(
df1: DataFrame,
min_nunique: int,
max_inflation: int,
max_nunique: int = None,
cols: list = None,
cols_keep: list = [],
test: bool = False,
verbose: bool = False
) → DataFrame
Remove low-complexity columns from the data.
Args:
df1
(pd.DataFrame): input data.min_nunique
(int): minimum unique values.max_inflation
(int): maximum over-representation of the values.cols
(list, optional): columns. Defaults to None.cols_keep
(list, optional): columns to keep. Defaults to [].test
(bool, optional): test mode. Defaults to False.
Returns:
pd.DataFrame
: output data.
function get_cols_x_for_comparison
get_cols_x_for_comparison(
df1: DataFrame,
cols_y: list,
cols_index: list,
cols_drop: list = [],
cols_dropby_patterns: list = [],
dropby_low_complexity: bool = True,
min_nunique: int = 5,
max_inflation: int = 50,
dropby_collinearity: bool = True,
coff_rs: float = 0.7,
dropby_variance_inflation: bool = True,
verbose: bool = False,
test: bool = False
) → dict
Identify X columns.
Parameters:
df1
(pd.DataFrame): input table.cols_y
(list): y columns.
function to_preprocessed_data
to_preprocessed_data(
df1: DataFrame,
columns: dict,
fill_missing_desc_value: bool = False,
fill_missing_cont_value: bool = False,
normby_zscore: bool = False,
verbose: bool = False,
test: bool = False
) → DataFrame
Preprocess data.
function to_filteredby_samples
to_filteredby_samples(
df1: DataFrame,
colindex: str,
colsample: str,
coff_samples_min: int,
colsubset: str,
coff_subsets_min: int = 2
) → DataFrame
Filter table before calculating differences. (1) Retain minimum number of samples per item representing a subset and (2) Retain minimum number of subsets per item.
Parameters:
df1
(pd.DataFrame): input table.colindex
(str): column containing items.colsample
(str): column containing samples.coff_samples_min
(int): minimum number of samples.colsubset
(str): column containing subsets.coff_subsets_min
(int): minimum number of subsets. Defaults to 2.
Returns: pd.DataFrame
Examples:
Parameters: colindex='genes id', colsample='sample id', coff_samples_min=3, colsubset= 'pLOF or WT' coff_subsets_min=2,
function get_cvsplits
get_cvsplits(
X: <built-in function array>,
y: <built-in function array> = None,
cv: int = 5,
random_state: int = None,
outtest: bool = True
) → dict
Get cross-validation splits. A friendly wrapper around sklearn.model_selection.KFold
.
Args:
X
(np.array): X matrix.y
(np.array): y vector.cv
(int, optional): cross validations. Defaults to 5.random_state
(int, optional): random state. Defaults to None.outtest
(bool, optional): output test data. Defaults to True.
Returns:
dict
: output.
module roux.stat.io
For input/output of stats.
function perc_label
perc_label(a, b=None, bracket=True)
function pval2annot
pval2annot(
pval: float,
alternative: str = None,
alpha: float = 0.05,
fmt: str = '*',
power: bool = True,
linebreak: bool = False,
replace_prefix: str = None
)
P/Q-value to annotation.
Parameters:
fmt
(str): *|<|'num'
module roux.workflow.task
For task management.
function validate_params
validate_params(d: dict) → bool
function run_task
run_task(
parameters: dict,
input_notebook_path: str,
kernel: str = None,
output_notebook_path: str = None,
start_timeout: int = 480,
verbose=False,
force=False,
**kws_papermill
) → str
Run a single task.
Prameters: parameters (dict): parameters including output_path
s. input_notebook_path (dict): path to the input notebook which is parameterized. kernel (str): kernel to be used. output_notebook_path: path to the output notebook which is used as a report. verbose (bool): verbose.
Keyword parameters: kws_papermill: parameters provided to the pm.execute_notebook
function.
Returns: Output path.
function apply_run_task
apply_run_task(
x: str,
input_notebook_path: str,
kernel: str,
force=False,
**kws_papermill
)
function run_tasks
run_tasks(
input_notebook_path: str,
kernel: str = None,
inputs: list = None,
output_path_base: str = None,
parameters_list=None,
fast: bool = False,
fast_workers: int = 6,
to_filter_nbby_patterns_kws=None,
input_notebook_temp_path=None,
out_paths: bool = True,
test1: bool = False,
force: bool = False,
test: bool = False,
verbose: bool = False,
**kws_papermill
) → list
Run a list of tasks.
Prameters: input_notebook_path (dict): path to the input notebook which is parameterized. kernel (str): kernel to be used. inputs (list): list of parameters without the output paths, which would be inferred by encoding. output_path_base (str): output path with a placeholder e.g. 'path/to/{KEY}/file'. parameters_list (list): list of parameters including the output paths. out_paths (bool): return paths of the reports (Defaults to True). test1 (bool): test only first task in the list (Defaults to False). fast (bool): enable parallel-processing. fast_workers (bool): number of parallel-processes. force (bool): overwrite the outputs. test (bool): test-mode. verbose (bool): verbose.
Keyword parameters: kws_papermill: parameters provided to the pm.execute_notebook
function e.g. working directory (cwd=) to_filter_nbby_patterns_kws (list): dictionary containing parameters to be provided to to_filter_nbby_patterns
function (Defaults to None).
Returns:
parameters_list
(list): list of parameters including the output paths, inferred if not provided.
TODOs: 0. Ignore temporary parameters e.g test, verbose etc while encoding inputs. 1. Integrate with apply_on_paths for parallel processing etc.
Notes:
- To resolve
RuntimeError: This event loop is already running in python
frommultiprocessing
, execute import nest_asyncio nest_asyncio.apply()
module roux.viz.heatmap
For heatmaps.
function plot_table
plot_table(
df1: DataFrame,
xlabel: str = None,
ylabel: str = None,
annot: bool = True,
cbar: bool = False,
linecolor: str = 'k',
linewidths: float = 1,
cmap: str = None,
sorty: bool = False,
linebreaky: bool = False,
scales: tuple = [1, 1],
ax: Axes = None,
**kws
) → Axes
Plot to show a table.
Args:
df1
(pd.DataFrame): input data.xlabel
(str, optional): x label. Defaults to None.ylabel
(str, optional): y label. Defaults to None.annot
(bool, optional): show numbers. Defaults to True.cbar
(bool, optional): show colorbar. Defaults to False.linecolor
(str, optional): line color. Defaults to 'k'.linewidths
(float, optional): line widths. Defaults to 1.cmap
(str, optional): color map. Defaults to None.sorty
(bool, optional): sort rows. Defaults to False.linebreaky
(bool, optional): linebreak for y labels. Defaults to False.scales
(tuple, optional): scale of the table. Defaults to [1,1].ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to thesns.heatmap
function.
Returns:
plt.Axes
:plt.Axes
object.
module roux.stat.paired
For paired stats.
function get_ratio_sorted
get_ratio_sorted(a: float, b: float, increase=True) → float
Get ratio sorted.
Args:
a
(float): value #1.b
(float): value #2.increase
(bool, optional): check for increase. Defaults to True.
Returns:
float
: output.
function diff
diff(a: float, b: float, absolute=True) → float
Get difference
Args:
a
(float): value #1.b
(float): value #2.absolute
(bool, optional): get absolute difference. Defaults to True.
Returns:
float
: output.
function get_diff_sorted
get_diff_sorted(a: float, b: float) → float
Difference sorted/absolute.
Args:
a
(float): value #1.b
(float): value #2.
Returns:
float
: output.
function balance
balance(a: float, b: float, absolute=True) → float
Balance.
Args:
a
(float): value #1.b
(float): value #2.absolute
(bool, optional): absolute difference. Defaults to True.
Returns:
float
: output.
function get_paired_sets_stats
get_paired_sets_stats(l1: list, l2: list, test: bool = False) → list
Paired stats comparing two sets.
Args:
l1
(list): set #1.l2
(list): set #2.test
(bool): test mode. Defaults to False.
Returns:
list
: tuple (overlap, intersection, union, ratio).
function get_stats_paired
get_stats_paired(
df1: DataFrame,
cols: list,
input_logscale: bool,
prefix: str = None,
drop_cols: bool = False,
unidirectional_stats: list = ['min', 'max'],
fast: bool = False
) → DataFrame
Paired stats, row-wise.
Args:
df1
(pd.DataFrame): input data.cols
(list): columns.input_logscale
(bool): if the input data is log-scaled.prefix
(str, optional): prefix of the output column/s. Defaults to None.drop_cols
(bool, optional): drop these columns. Defaults to False.unidirectional_stats
(list, optional): column-wise status. Defaults to ['min','max'].fast
(bool, optional): parallel processing. Defaults to False.
Returns:
pd.DataFrame
: output dataframe.
function get_stats_paired_agg
get_stats_paired_agg(
x: <built-in function array>,
y: <built-in function array>,
ignore: bool = False,
verb: bool = True
) → Series
Paired stats aggregated, for example, to classify 2D distributions.
Args:
x
(np.array): x vector.y
(np.array): y vector.ignore
(bool, optional): suppress warnings. Defaults to False.verb
(bool, optional): verbose. Defaults to True.
Returns:
pd.Series
: output.
function classify_sharing
classify_sharing(
df1: DataFrame,
column_value: str,
bins: list = [0, 25, 75, 100],
labels: list = ['low', 'medium', 'high'],
prefix: str = '',
verbose: bool = False
) → DataFrame
Classify sharing % calculated from Jaccard index.
Parameters:
df1
(pd.DataFrame): input table.column_value
(str): column with values.bins
(list): bins. Defaults to [0,25,75,100].labels
(list): bin labels. Defaults to ['low','medium','high'],prefix
(str): prefix of the columns.verbose
(bool): verbose. Defaults to False.
module roux.stat.variance
For variance related stats.
function confidence_interval_95
confidence_interval_95(x: <built-in function array>) → float
95% confidence interval.
Args:
x
(np.array): input vector.
Returns:
float
: output.
function get_ci
get_ci(rs, ci_type, outstr=False)
function get_variance_inflation
get_variance_inflation(data, coly: str, cols_x: list = None)
Variance Inflation Factor (VIF). A wrapper around statsmodels
's 'variance_inflation_factor
function.
Parameters:
data
(pd.DataFrame): input data.coly
(str): dependent variable.cols_x
(list): independent variables.
Returns: pd.Series
module roux.stat.norm
For normalisation.
function to_norm
to_norm(x, off=1e-05)
Normalise a vector bounded between 0 and 1.
function norm_by_quantile
norm_by_quantile(X: <built-in function array>) → <built-in function array>
Quantile normalize the columns of X.
Params: X : 2D array of float, shape (M, N). The input data, with M rows (genes/features) and N columns (samples).
Returns:
Xn
: 2D array of float, shape (M, N). The normalized data.
Notes:
Faster processing (~5 times compared to other function tested) because of the use of numpy arrays. TODOs: Use
from sklearn.preprocessing import QuantileTransformer
withoutput_distribution
parameter allowing rescaling back to the same distribution kind.
function norm_by_gaussian_kde
norm_by_gaussian_kde(
values: <built-in function array>
) → <built-in function array>
Normalise matrix by gaussian KDE.
Args:
values
(np.array): input matrix.
Returns:
np.array
: output matrix.
References:
https
: //github.com/saezlab/protein_attenuation/blob/6c1e81af37d72ef09835ee287f63b000c7c6663c/src/protein_attenuation/utils.py
function zscore
zscore(df: DataFrame, cols: list = None) → DataFrame
Z-score.
Args:
df
(pd.DataFrame): input table.
Returns:
pd.DataFrame
: output table.
TODOs: 1. Use scipy or sklearn's zscore because of it's additional options from scipy.stats import zscore df.apply(zscore)
function zscore_robust
zscore_robust(a: <built-in function array>) → <built-in function array>
Robust Z-score.
Args:
a
(np.array): input data.
Returns:
np.array
: output.
Example: t = sc.stats.norm.rvs(size=100, scale=1, random_state=123456) plt.hist(t,bins=40) plt.hist(apply_zscore_robust(t),bins=40) print(np.median(t),np.median(apply_zscore_robust(t)))
function norm_covariance_PCA
norm_covariance_PCA(
X: <built-in function array>,
use_svd: bool = True,
use_sklearn: bool = True,
rescale_centered: bool = True,
random_state: int = 0,
test: bool = False,
verbose: bool = False
) → <built-in function array>
Covariance normalization by PCA whitening.
Args:
X
(np.array): input arrayuse_svd
(bool, optional): use SVD method. Defaults to True.use_sklearn
(bool, optional): useskelearn
for SVD method. Defaults to True.rescale_centered
(bool, optional): rescale to centered input. Defaults to True.random_state
(int, optional): random state. Defaults to 0.test
(bool, optional): test mode. Defaults to False.verbose
(bool, optional): verbose. Defaults to False.
Returns:
np.array
: transformed data.
module roux.stat.diff
For difference related stats.
function compare_classes
compare_classes(x, y, method=None)
Compare classes
function compare_classes_many
compare_classes_many(df1: DataFrame, cols_y: list, cols_x: list) → DataFrame
function get_pval
get_pval(
df: DataFrame,
colvalue='value',
colsubset='subset',
colvalue_bool=False,
colindex=None,
subsets=None,
test=False,
func=None
) → tuple
Get p-value.
Args:
df
(DataFrame): input dataframe.colvalue
(str, optional): column with values. Defaults to 'value'.colsubset
(str, optional): column with subsets. Defaults to 'subset'.colvalue_bool
(bool, optional): column with boolean values. Defaults to False.colindex
(str, optional): column with the index. Defaults to None.subsets
(list, optional): subset types. Defaults to None.test
(bool, optional): test. Defaults to False.func
(function, optional): function. Defaults to None.
Raises:
ArgumentError
: colvalue or colsubset not found in df.ValueError
: need only 2 subsets.
Returns:
tuple
: stat,p-value
function get_stat
get_stat(
df1: DataFrame,
colsubset: str,
colvalue: str,
colindex: str,
subsets=None,
cols_subsets=['subset1', 'subset2'],
df2=None,
stats=['mean', 'median', 'var', 'size'],
coff_samples_min=None,
verb=False,
func=None,
**kws
) → DataFrame
Get statistics.
Args:
df1
(DataFrame): input dataframe.colvalue
(str, optional): column with values. Defaults to 'value'.colsubset
(str, optional): column with subsets. Defaults to 'subset'.colindex
(str, optional): column with the index. Defaults to None.subsets
(list, optional): subset types. Defaults to None.cols_subsets
(list, optional): columns with subsets. Defaults to ['subset1', 'subset2'].df2
(DataFrame, optional): second dataframe. Defaults to None.stats
(list, optional): summary statistics. Defaults to [np.mean,np.median,np.var]+[len].coff_samples_min
(int, optional): minimum sample size required. Defaults to None.verb
(bool, optional): verbose. Defaults to False.
Keyword Arguments:
kws
: parameters provided toget_pval
function.
Raises:
ArgumentError
: colvalue or colsubset not found in df.ValueError
: len(subsets)<2
Returns:
DataFrame
: output dataframe.
TODOs: 1. Rename to more specific get_diff
, also other get_stat*
/get_pval*
functions.
function get_stats
get_stats(
df1: DataFrame,
colsubset: str,
cols_value: list,
colindex: str,
subsets=None,
df2=None,
cols_subsets=['subset1', 'subset2'],
stats=['mean', 'median', 'var', 'size'],
axis=0,
test=False,
**kws
) → DataFrame
Get statistics by iterating over columns wuth values.
Args:
df1
(DataFrame): input dataframe.colsubset
(str, optional): column with subsets.cols_value
(list): list of columns with values.colindex
(str, optional): column with the index.subsets
(list, optional): subset types. Defaults to None.df2
(DataFrame, optional): second dataframe, e.g.pd.DataFrame({"subset1":['test'],"subset2":['reference']})
. Defaults to None.cols_subsets
(list, optional): columns with subsets. Defaults to ['subset1', 'subset2'].stats
(list, optional): summary statistics. Defaults to [np.mean,np.median,np.var]+[len].axis
(int, optional): 1 if different tests else use 0. Defaults to 0.
Keyword Arguments:
kws
: parameters provided toget_pval
function.
Raises:
ArgumentError
: colvalue or colsubset not found in df.ValueError
: len(subsets)<2
Returns:
DataFrame
: output dataframe.
TODOs: 1. No column prefix if len(cols_value)==1
.
function get_significant_changes
get_significant_changes(
df1: DataFrame,
coff_p=0.025,
coff_q=0.1,
alpha=None,
change_type=['diff', 'ratio'],
changeby='mean',
value_aggs=['mean', 'median']
) → DataFrame
Get significant changes.
Args:
df1
(DataFrame): input dataframe.coff_p
(float, optional): cutoff on p-value. Defaults to 0.025.coff_q
(float, optional): cutoff on q-value. Defaults to 0.1.alpha
(float, optional): alias forcoff_p
. Defaults to None.changeby
(str, optional): "" if check for change by both mean and median. Defaults to "".value_aggs
(list, optional): values to aggregate. Defaults to ['mean','median'].
Returns:
DataFrame
: output dataframe.
function apply_get_significant_changes
apply_get_significant_changes(
df1: DataFrame,
cols_value: list,
cols_groupby: list,
cols_grouped: list,
fast=False,
**kws
) → DataFrame
Apply on dataframe to get significant changes.
Args:
df1
(DataFrame): input dataframe.cols_value
(list): columns with values.cols_groupby
(list): columns with groups.
Returns:
DataFrame
: output dataframe.
function get_stats_groupby
get_stats_groupby(
df1: DataFrame,
cols_group: list,
coff_p: float = 0.05,
coff_q: float = 0.1,
alpha=None,
fast=False,
**kws
) → DataFrame
Iterate over groups, to get the differences.
Args:
df1
(DataFrame): input dataframe.cols_group
(list): columns to interate over.coff_p
(float, optional): cutoff on p-value. Defaults to 0.025.coff_q
(float, optional): cutoff on q-value. Defaults to 0.1.alpha
(float, optional): alias forcoff_p
. Defaults to None.fast
(bool, optional): parallel processing. Defaults to False.
Returns:
DataFrame
: output dataframe.
function get_diff
get_diff(
df1: DataFrame,
cols_x: list,
cols_y: list,
cols_index: list,
cols_group: list,
coff_p: float = None,
test: bool = False,
func=None,
**kws
) → DataFrame
Wrapper around the get_stats_groupby
Keyword parameters: cols=['variable x','variable y'], coff_p=0.05, coff_q=0.01, colindex=['id'],
function binby_pvalue_coffs
binby_pvalue_coffs(
df1: DataFrame,
coffs=[0.01, 0.05, 0.1],
color=False,
testn='MWU test, FDR corrected',
colindex='genes id',
colgroup='tissue',
preffix='',
colns=None,
palette=None
) → tuple
Bin data by pvalue cutoffs.
Args:
df1
(DataFrame): input dataframe.coffs
(list, optional): cut-offs. Defaults to [0.01,0.05,0.25].color
(bool, optional): color asignment. Defaults to False.testn
(str, optional): test number. Defaults to 'MWU test, FDR corrected'.colindex
(str, optional): column with index. Defaults to 'genes id'.colgroup
(str, optional): column with the groups. Defaults to 'tissue'.preffix
(str, optional): prefix. Defaults to ''.colns
(type, optional): columns number. Defaults to None.notcountedpalette
(type, optional): description. Defaults to None.
Returns:
tuple
: output.
Notes:
- To be deprecated in the favor of the functions used for enrichment analysis for example.
module roux.workflow.df
For management of tables.
function exclude_items
exclude_items(df1: DataFrame, metadata: dict) → DataFrame
Exclude items from the table with the workflow info.
Args:
df1
(pd.DataFrame): input table.metadata
(dict): metadata of the repository.
Returns:
pd.DataFrame
: output.
module roux.lib.dict
For processing dictionaries.
function head_dict
head_dict(d, lines=5)
function sort_dict
sort_dict(d1, by=1, ascending=True)
Sort dictionary by values.
Parameters:
d1
(dict): input dictionary.by
(int): index of the value among the values.ascending
(bool): ascending order.
Returns:
d1
(dict): output dictionary.
function merge_dicts
merge_dicts(l: list) → dict
Merge dictionaries.
Parameters:
l
(list): list containing the dictionaries.
Returns:
d
(dict): output dictionary.
TODOs: 1. In python>=3.9, merged = d1 | d2
?
function merge_dicts_deep
merge_dicts_deep(left: dict, right: dict) → dict
Merge nested dictionaries. Overwrites left with right.
Parameters:
left
(dict): dictionary #1right
(dict): dictionary #2
TODOs: 1. In python>=3.9, merged = d1 | d2
?
function merge_dict_values
merge_dict_values(l, test=False)
Merge dictionary values.
Parameters:
l
(list): list containing the dictionaries.test
(bool): verbose.
Returns:
d
(dict): output dictionary.
function flip_dict
flip_dict(d)
switch values with keys and vice versa.
Parameters:
d
(dict): input dictionary.
Returns:
d
(dict): output dictionary.
module roux.workflow.nb
For operations on jupyter notebooks.
function get_lines
get_lines(p: str, keep_comments: bool = True) → list
Get lines of code from notebook.
Args:
p
(str): path to notebook.keep_comments
(bool, optional): keep comments. Defaults to True.
Returns:
list
: lines.
function read_nb_md
read_nb_md(p: str, n: int = None) → list
Read notebook's documentation in the markdown cells.
Args:
p
(str): path of the notebook.n
(int): number of the markdown cells to extract.
Returns:
list
: lines of the strings.
function to_info
to_info(p: str, outp: str, linkd: str = '') → str
Save README.md file with table of contents obtained from jupyter notebooks.
Args:
p
(str, optional): path of the notebook files that would be converted to "tasks".outp
(str, optional): path of the output file, e.g. 'README.md'.
Returns:
str
: path of the output file.
function to_replaced_nb
to_replaced_nb(
nb_path,
output_path,
replaces: dict = {},
cell_type: str = 'code',
drop_lines_with_substrings: list = None,
test=False
)
Replace text in a jupyter notebook.
Parameters nb: notebook object obtained from nbformat.reads
. replaces (dict): mapping of text to 'replace from' to the one to 'replace with'. cell_type (str): the type of the cell.
Returns:
new_nb
: notebook object.
function to_filtered_nb
to_filtered_nb(
p: str,
outp: str,
header: str,
kind: str = 'include',
validate_diff: int = None
)
Filter sections in a notebook based on markdown headings.
Args:
header
(str): exact first line of a markdown cell marking a section in a notebook. validate_diff
function to_filter_nbby_patterns
to_filter_nbby_patterns(p, outp, patterns=None, **kws)
Filter out notebook cells if the pattern string is found.
Args:
patterns
(list): list of string patterns.
function to_clear_unused_cells
to_clear_unused_cells(
notebook_path,
new_notebook_path,
validate_diff: int = None
)
Remove code cells with all lines commented.
function to_clear_outputs
to_clear_outputs(notebook_path, new_notebook_path)
function to_filtered_outputs
to_filtered_outputs(input_path, output_path, warnings=True, strings=True)
module roux.viz.sets
For plotting sets.
function plot_venn
plot_venn(
ds1: Series,
ax: Axes = None,
figsize: tuple = [2.5, 2.5],
show_n: bool = True,
outmore=False,
**kws
) → Axes
Plot Venn diagram.
Args:
ds1
(pd.Series): input pandas.Series or dictionary. Subsets in the index levels, mapped to counts.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.figsize
(tuple, optional): figure size. Defaults to [2.5,2.5].show_n
(bool, optional): show sample sizes. Defaults to True.
Returns:
plt.Axes
:plt.Axes
object.
function plot_intersection_counts
plot_intersection_counts(
df1: DataFrame,
cols: list = None,
kind: str = 'table',
method: str = None,
show_counts: bool = True,
show_pval: bool = True,
confusion: bool = False,
rename_cols: bool = False,
sort_cols: tuple = [True, True],
order_x: list = None,
order_y: list = None,
cmap: str = 'Reds',
ax: Axes = None,
kws_show_stats: dict = {},
**kws_plot
) → Axes
Plot counts for the intersection between two sets.
Args:
df1
(pd.DataFrame): input datacols
(list, optional): columns. Defaults to None.kind
(str, optional): kind of plot: table or barplot. Detaults to table.method
(str, optional): method to check the association ['chi2','FE']. Defaults to None.rename_cols
(bool, optional): rename the columns. Defaults to True.show_pval
(bool, optional): annotate p-values. Defaults to True.cmap
(str, optional): colormap. Defaults to 'Reds'.kws_show_stats
(dict, optional): arguments provided to stats function. Defaults to {}.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Raises:
ValueError
:show_pval
position should be the allowed one.
Keyword Args:
kws_plot
: keyword arguments provided to the plotting function.
Returns:
plt.Axes
:plt.Axes
object.
TODOs: 1. Use compare_classes
to get the stats.
function plot_intersections
plot_intersections(
ds1: Series,
item_name: str = None,
figsize: tuple = [4, 4],
text_width: float = 2,
yorder: list = None,
sort_by: str = 'cardinality',
sort_categories_by: str = None,
element_size: int = 40,
facecolor: str = 'gray',
bari_annot: int = None,
totals_bar: bool = False,
totals_text: bool = True,
intersections_ylabel: float = None,
intersections_min: float = None,
test: bool = False,
annot_text: bool = False,
set_ylabelx: float = -0.25,
set_ylabely: float = 0.5,
**kws
) → Axes
Plot upset plot.
Args:
ds1
(pd.Series): input vector.item_name
(str, optional): name of items. Defaults to None.figsize
(tuple, optional): figure size. Defaults to [4,4].text_width
(float, optional): max. width of the text. Defaults to 2.yorder
(list, optional): order of y elements. Defaults to None.sort_by
(str, optional): sorting method. Defaults to 'cardinality'.sort_categories_by
(str, optional): sorting method. Defaults to None.element_size
(int, optional): size of elements. Defaults to 40.facecolor
(str, optional): facecolor. Defaults to 'gray'.bari_annot
(int, optional): annotate nth bar. Defaults to None.totals_text
(bool, optional): show totals. Defaults to True.intersections_ylabel
(float, optional): y-label of the intersections. Defaults to None.intersections_min
(float, optional): intersection minimum to show. Defaults to None.test
(bool, optional): test mode. Defaults to False.annot_text
(bool, optional): annotate text. Defaults to False.set_ylabelx
(float, optional): x position of the ylabel. Defaults to -0.25.set_ylabely
(float, optional): y position of the ylabel. Defaults to 0.5.
Keyword Args:
kws
: parameters provided to theupset.plot
function.
Returns:
plt.Axes
:plt.Axes
object.
Notes:
sort_by:{‘cardinality’, ‘degree’} If ‘cardinality’, subset are listed from largest to smallest. If ‘degree’, they are listed in order of the number of categories intersected. sort_categories_by:{‘cardinality’, None} Whether to sort the categories by total cardinality, or leave them in the provided order. References: https://upsetplot.readthedocs.io/en/stable/api.html
function plot_enrichment
plot_enrichment(
data: DataFrame,
x: str,
y: str,
s: str,
hue='Q',
xlabel=None,
ylabel='significance\n(-log10(Q))',
size: int = None,
color: str = None,
annots_side: int = 5,
annots_side_labels=None,
coff_fdr: float = None,
xlim: tuple = None,
xlim_off: float = 0.2,
ylim: tuple = None,
ax: Axes = None,
break_pt: int = 25,
annot_coff_fdr: bool = False,
kws_annot: dict = {'loc': 'right', 'offx3': 0.15},
returns='ax',
**kwargs
) → Axes
Plot enrichment stats.
Args:
- <b>`data`</b> (pd.DataFrame): input data.
- <b>`x`</b> (str): x column.
- <b>`y`</b> (str): y column.
- <b>`s`</b> (str): size column.
- <b>`size`</b> (int, optional): size of the points. Defaults to None.
- <b>`color`</b> (str, optional): color of the points. Defaults to None.
- <b>`annots_side`</b> (int, optional): how many labels to show on side. Defaults to 5.
- <b>`coff_fdr`</b> (float, optional): FDR cutoff. Defaults to None.
- <b>`xlim`</b> (tuple, optional): x-axis limits. Defaults to None.
- <b>`xlim_off`</b> (float, optional): x-offset on limits. Defaults to 0.2.
- <b>`ylim`</b> (tuple, optional): y-axis limits. Defaults to None.
- <b>`ax`</b> (plt.Axes, optional): `plt.Axes` object. Defaults to None.
- <b>`break_pt`</b> (int, optional): break point (' ') for the labels. Defaults to 25.
- <b>`annot_coff_fdr`</b> (bool, optional): show FDR cutoff. Defaults to False.
- <b>`kws_annot`</b> (dict, optional): parameters provided to the `annot_side` function. Defaults to dict( loc='right', annot_count_max=5, offx3=0.15, ).
Keyword Args:
- kwargs
: parameters provided to the sns.scatterplot
function.
Returns:
- <b>`plt.Axes`</b>: `plt.Axes` object.
function plot_pie
plot_pie(
counts: list,
labels: list,
scales_line_xy: tuple = (1.1, 1.1),
remove_wedges: list = None,
remove_wedges_index: list = [],
line_color: str = 'k',
annot_side: bool = False,
kws_annot_side: dict = {},
ax: Axes = None,
**kws_pie
) → Axes
Pie plot.
Args:
counts
(list): counts.labels
(list): labels.scales_line_xy
(tuple, optional): scales for the lines. Defaults to (1.1,1.1).remove_wedges
(list, optional): remove wedge/s. Defaults to None.remove_wedges_index
(list, optional): remove wedge/s by index. Defaults to [].line_color
(str, optional): line color. Defaults to 'k'.annot_side
(bool, optional): annotations on side using theannot_side
function. Defaults to False.kws_annot_side
(dict, optional): keyword arguments provided to theannot_side
function. Defaults to {}.ax
(plt.Axes, optional): subplot. Defaults to None.
Keyword Args:
kws_pie
: keyword arguments provided to thepie
chart function.
Returns:
plt.Axes
: subplot
References:
https
: //matplotlib.org/stable/gallery/pie_and_polar_charts/pie_and_donut_labels.html
module roux.stat.compare
For comparison related stats.
function get_comparison
get_comparison(
df1: DataFrame,
d1: dict = None,
coff_p: float = 0.05,
between_ys: bool = False,
verbose: bool = False,
**kws
)
Compare the x and y columns.
Parameters:
df1
(pd.DataFrame): input table.d1
(dict): columns dict, output ofget_cols_x_for_comparison
.between_ys
(bool): compare y's
Notes:
Column information: d1={'cols_index': ['id'], 'cols_x': {'cont': [], 'desc': []}, 'cols_y': {'cont': [], 'desc': []}} Comparison types: 1. continuous vs continuous -> correlation 2. decrete vs continuous -> difference 3. decrete vs decrete -> FE or chi square
function compare_strings
compare_strings(l0: list, l1: list, cutoff: float = 0.5) → DataFrame
Compare two lists of strings.
Parameters:
l0
(list): list of strings.l1
(list): list of strings to compare with.cutoff
(float): threshold to filter the comparisons.
Returns: table with the similarity scores.
TODOs: 1. Add option for semantic similarity.
module roux.lib.dfs
For processing multiple pandas DataFrames/Series
function filter_dfs
filter_dfs(dfs: list, cols: list, how: str = 'inner') → DataFrame
Filter dataframes based items in the common columns.
Parameters:
dfs
(list): list of dataframes.cols
(list): list of columns.how
(str): how to filter ('inner')
Returns
dfs
(list): list of dataframes.
function merge_with_many_columns
merge_with_many_columns(
df1: DataFrame,
right: str,
left_on: str,
right_ons: list,
right_id: str,
how: str = 'inner',
validate: str = '1:1',
test: bool = False,
verbose: bool = False,
**kws_merge
) → DataFrame
Merge with many columns. For example, if ids in the left table can map to ids located in multiple columns of the right table.
Parameters:
df1
(pd.DataFrame): left table.right
(pd.DataFrame): right table.left_on
(str): column in the left table to merge on.right_ons
(list): columns in the right table to merge on.right_id
(str): column in the right dataframe with for example the ids to be merged.
Keyword parameters:
kws_merge
: to be supplied topandas.DataFrame.merge
.
Returns: Merged table.
function merge_paired
merge_paired(
df1: DataFrame,
df2: DataFrame,
left_ons: list,
right_on: list,
common: list = [],
right_ons_common: list = [],
how: str = 'inner',
validates: list = ['1:1', '1:1'],
suffixes: list = None,
test: bool = False,
verb: bool = True,
**kws
) → DataFrame
Merge uppaired dataframes to a paired dataframe.
Parameters:
df1
(DataFrame): paired dataframe.df2
(DataFrame): unpaired dataframe.left_ons
(list): columns of thedf1
(suffixed).right_on
(str|list): column/s of thedf2
(to be suffixed).common
(str|list): common column/s betweendf1
anddf2
(not suffixed).right_ons_common
(str|list): common column/s betweendf2
to be used for merging (not to be suffixed).how
(str): method of merging ('inner').validates
(list): validate mappings for the 1st mapping betweendf1
anddf2
and 2nd one betweendf1+df2
anddf2
(['1:1','1:1']).suffixes
(list): suffixes to be used (None).test
(bool): testing (False).verb
(bool): verbose (True).
Keyword Parameters:
kws
(dict): parameters provided tomerge
.
Returns:
df
(DataFrame): output dataframe.
Examples:
Parameters: how='inner', left_ons=['gene id gene1','gene id gene2'], # suffixed common='sample id', # not suffixed right_on='gene id', # to be suffixed right_ons_common=[], # not to be suffixed
function merge_dfs
merge_dfs(dfs: list, **kws) → DataFrame
Merge dataframes from left to right.
Parameters:
dfs
(list): list of dataframes.
Keyword Parameters:
kws
(dict): parameters provided tomerge
.
Returns:
df
(DataFrame): output dataframe.
Notes:
For example, reduce(lambda x, y: x.merge(y), [1, 2, 3, 4, 5]) merges ((((1.merge(2)).merge(3)).merge(4)).merge(5)).
function compare_rows
compare_rows(df1, df2, test=False, **kws)
module roux.viz.scatter
For scatter plots.
function plot_scatter_agg
plot_scatter_agg(
dplot: DataFrame,
x: str = None,
y: str = None,
z: str = None,
kws_legend={'bbox_to_anchor': [1, 1], 'loc': 'upper left'},
title=None,
label_colorbar=None,
ax=None,
kind=None,
verbose=False,
cmap='Blues',
gridsize=10,
**kws
)
UNDER DEV.
function plot_scatter
plot_scatter(
data: DataFrame,
x: str = None,
y: str = None,
z: str = None,
kind: str = 'scatter',
scatter_kws={},
line_kws={},
stat_method: str = 'spearman',
stat_kws={},
hollow: bool = False,
ax: Axes = None,
verbose: bool = True,
**kws
) → Axes
Plot scatter with multiple layers and stats.
Args:
data
(pd.DataFrame): input dataframe.x
(str): x column.y
(str): y column.z
(str, optional): z column. Defaults to None.kind
(str, optional): kind of scatter. Defaults to 'hexbin'.trendline_method
(str, optional): trendline method ['poly','lowess']. Defaults to 'poly'.stat_method
(str, optional): method of annoted stats ['mlr',"spearman"]. Defaults to "spearman".cmap
(str, optional): colormap. Defaults to 'Reds'.label_colorbar
(str, optional): label of the colorbar. Defaults to None.gridsize
(int, optional): number of grids in the hexbin. Defaults to 25.bbox_to_anchor
(list, optional): location of the legend. Defaults to [1,1].loc
(str, optional): location of the legend. Defaults to 'upper left'.title
(str, optional): title of the plot. Defaults to None.#params_plot (dict, optional)
: parameters provided to theplot
function. Defaults to {}.line_kws
(dict, optional): parameters provided to theplot_trendline
function. Defaults to {}.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to theplot
function.
Returns:
plt.Axes
:plt.Axes
object.
Notes:
- For a rasterized scatter plot set
scatter_kws={'rasterized': True}
2. This function does not apply multiple colors, similar tosns.regplot
.
function plot_qq
plot_qq(x: Series) → Axes
plot QQ.
Args:
x
(pd.Series): input vector.
Returns:
plt.Axes
:plt.Axes
object.
function plot_ranks
plot_ranks(
df1: DataFrame,
col: str,
colid: str,
ranks_on: str = 'y',
ascending: bool = True,
col_rank: str = None,
line: bool = True,
kws_line={},
show_topn: int = None,
show_ids: list = None,
ax=None,
**kws
) → Axes
Plot rankings.
Args:
dplot
(pd.DataFrame): input data.colx
(str): x column.coly
(str): y column.colid
(str): column with unique ids.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to theseaborn.scatterplot
function.
Returns:
plt.Axes
:plt.Axes
object.
Usage: Combined with annotations using annot_side
.
function plot_volcano
plot_volcano(
data: DataFrame,
colx: str,
coly: str,
colindex: str,
hue: str = 'x',
style: str = 'P=0',
style_order: list = ['o', '^'],
markers: list = ['o', '^'],
show_labels: int = None,
labels_layout: str = None,
labels_kws: dict = {},
show_outlines: int = None,
outline_colors: list = ['k'],
collabel: str = None,
show_line=True,
line_pvalue=0.1,
line_x: float = 0.0,
line_x_min: float = None,
show_text: bool = True,
text_increase: str = None,
text_decrease: str = None,
text_diff: str = None,
legend: bool = False,
verbose: bool = False,
p_min: float = None,
ax: Axes = None,
outmore: bool = False,
kws_legend: dict = {},
**kws_scatterplot
) → Axes
Volcano plot.
Parameters:
Keyword parameters:
Returns: plt.Axes
module roux.run
For access to a few functions from the terminal.
module roux.lib.str
For processing strings.
function substitution
substitution(s, i, replaceby)
Substitute character in a string.
Parameters:
s
(string): string.i
(int): location.replaceby
(string): character to substitute with.
Returns:
s
(string): output string.
function substitution
substitution(s, i, replaceby)
Substitute character in a string.
Parameters:
s
(string): string.i
(int): location.replaceby
(string): character to substitute with.
Returns:
s
(string): output string.
function replace_many
replace_many(
s: str,
replaces: dict,
replacewith: str = '',
ignore: bool = False
)
Rename by replacing sub-strings.
Parameters:
s
(str): input string.replaces
(dict|list): from->to format or list containing substrings to remove.replacewith
(str): replace to in casereplaces
is a list.ignore
(bool): if True, not validate the successful replacements.
Returns:
s
(DataFrame): output dataframe.
function replace_many
replace_many(
s: str,
replaces: dict,
replacewith: str = '',
ignore: bool = False
)
Rename by replacing sub-strings.
Parameters:
s
(str): input string.replaces
(dict|list): from->to format or list containing substrings to remove.replacewith
(str): replace to in casereplaces
is a list.ignore
(bool): if True, not validate the successful replacements.
Returns:
s
(DataFrame): output dataframe.
function filter_list
filter_list(l: list, patterns: list, kind='out') → list
Filter a list of strings.
Args:
l
(list): list of strings.patterns
(list): list of regex patterns. patterns are applied after stripping the whitespaces.
Returns: (list) list of filtered strings.
function tuple2str
tuple2str(tup, sep=' ')
Join tuple items.
Parameters:
tup
(tuple|list): input tuple/list.sep
(str): separator between the items.
Returns:
s
(str): output string.
function linebreaker
linebreaker(text, width=None, break_pt=None, sep='\n', **kws)
Insert newline
s within a string.
Parameters:
text
(str): string.width
(int): insertnewline
at this interval.sep
(string): separator to split the sub-strings.
Returns:
s
(string): output string.
References:
1.
textwrap``: https://docs.python.org/3/library/textwrap.html
function findall
findall(s, ss, outends=False, outstrs=False, suffixlen=0)
Find the substrings or their locations in a string.
Parameters:
s
(string): input string.ss
(string): substring.outends
(bool): output end positions.outstrs
(bool): output strings.suffixlen
(int): length of the suffix.
Returns:
l
(list): output list.
function get_marked_substrings
get_marked_substrings(
s,
leftmarker='{',
rightmarker='}',
leftoff=0,
rightoff=0
) → list
Get the substrings flanked with markers from a string.
Parameters:
s
(str): input string.leftmarker
(str): marker on the left.rightmarker
(str): marker on the right.leftoff
(int): offset on the left.rightoff
(int): offset on the right.
Returns:
l
(list): list of substrings.
function get_marked_substrings
get_marked_substrings(
s,
leftmarker='{',
rightmarker='}',
leftoff=0,
rightoff=0
) → list
Get the substrings flanked with markers from a string.
Parameters:
s
(str): input string.leftmarker
(str): marker on the left.rightmarker
(str): marker on the right.leftoff
(int): offset on the left.rightoff
(int): offset on the right.
Returns:
l
(list): list of substrings.
function mark_substrings
mark_substrings(s, ss, leftmarker='(', rightmarker=')') → str
Mark sub-string/s in a string.
Parameters:
s
(str): input string.ss
(str): substring.leftmarker
(str): marker on the left.rightmarker
(str): marker on the right.
Returns:
s
(str): string.
function get_bracket
get_bracket(s, leftmarker='(', righttmarker=')') → str
Get bracketed substrings.
Parameters:
s
(string): string.leftmarker
(str): marker on the left.rightmarker
(str): marker on the right.
Returns:
s
(str): string.
TODOs: 1. Use get_marked_substrings
.
function align
align(
s1: str,
s2: str,
prefix: bool = False,
suffix: bool = False,
common: bool = True
) → list
Align strings.
Parameters:
s1
(str): string #1.s2
(str): string #2.prefix
(str): prefix.suffix
(str): suffix.common
(str): common substring.
Returns:
l
(list): output list.
Notes:
- Code to test: [ get_prefix(source,target,common=False), get_prefix(source,target,common=True), get_suffix(source,target,common=False), get_suffix(source,target,common=True),]
function get_prefix
get_prefix(s1, s2: str = None, common: bool = True, clean: bool = True) → str
Get the prefix of the strings
Parameters:
s1
(str|list): 1st string.s2
(str): 2nd string (default:None).common
(bool): get the common prefix (default:True).clean
(bool): clean the leading and trailing whitespaces (default:True).
Returns:
s
(str): prefix.
function get_suffix
get_suffix(s1, s2: str = None, common: bool = True, clean: bool = True) → str
Get the suffix of the strings
Parameters:
s1
(str|list): 1st string.s2
(str): 2nd string (default:None).common
(bool): get the common prefix (default:True).clean
(bool): clean the leading and trailing whitespaces (default:True).
Returns:
s
(str): prefix.
function get_fix
get_fix(s1: str, s2: str, **kws: dict) → str
Infer common prefix or suffix.
Parameters:
s1
(str): 1st string.s2
(str): 2nd string.
Keyword parameters:
kws
: parameters provided to theget_prefix
andget_suffix
functions.
Returns:
s
(str): prefix or suffix.
function removesuffix
removesuffix(s1: str, suffix: str) → str
Remove suffix.
Paramters: s1 (str): input string. suffix (str): suffix.
Returns:
s1
(str): string without the suffix.
TODOs: 1. Deprecate in py>39 use .removesuffix() instead.
function str2dict
str2dict(
s: str,
reversible: bool = True,
sep: str = ';',
sep_equal: str = '='
) → dict
String to dictionary.
Parameters:
s
(str): string.sep
(str): separator between entries (default:';').sep_equal
(str): separator between the keys and the values (default:'=').
Returns:
d
(dict): dictionary.
References:
1. https
: //stackoverflow.com/a/186873/3521099
function dict2str
dict2str(
d1: dict,
reversible: bool = True,
sep: str = ';',
sep_equal: str = '='
) → str
Dictionary to string.
Parameters:
d
(dict): dictionary.sep
(str): separator between entries (default:';').sep_equal
(str): separator between the keys and the values (default:'=').reversible
(str): use json
Returns:
s
(str): string.
function str2num
str2num(s: str) → float
String to number.
Parameters:
s
(str): string.
Returns:
i
(int): number.
function num2str
num2str(
num: float,
magnitude: bool = False,
coff: float = 10000,
decimals: int = 0
) → str
Number to string.
Parameters:
num
(int): number.magnitude
(bool): use magnitudes (default:False).coff
(int): cutoff (default:10000).decimals
(int): decimal points (default:0).
Returns:
s
(str): string.
TODOs 1. ~ if magnitude else not
function encode
encode(data, short: bool = False, method_short: str = 'sha256', **kws) → str
Encode the data as a string.
Parameters:
data
(str|dict|Series): input data.short
(bool): Outputs short string, compatible with paths but non-reversible. Defaults to False.method_short
(str): method used for encoding when short=True.
Keyword parameters:
kws
: parameters provided to encoding function.
Returns:
s
(string): output string.
function decode
decode(s, out=None, **kws_out)
Decode data from a string.
Parameters:
s
(string): encoded string.out
(str): output format (dict|df).
Keyword parameters:
kws_out
: parameters provided todict2df
.
Returns:
d
(dict|DataFrame): output data.
function to_formula
to_formula(
replaces={' ': 'SPACE', '(': 'LEFTBRACKET', ')': 'RIGHTTBRACKET', '.': 'DOT', ',': 'COMMA', '%': 'PERCENT', "'": 'INVCOMMA', '+': 'PLUS', '-': 'MINUS'},
reverse=False
) → dict
Converts strings to the formula format, compatible with patsy
for example.
module roux.workflow.monitor
For workflow monitors.
function plot_workflow_log
plot_workflow_log(dplot: DataFrame) → Axes
Plot workflow log.
Args:
dplot
(pd.DataFrame): input data (dparam).
Returns:
plt.Axes
: output.
TODOs: 1. use the statistics tagged as ## stats
.
module roux.lib.text
For processing text files.
function get_header
get_header(path: str, comment='#', lineno=None)
Get the header of a file.
Args:
path
(str): path.comment
(str): comment identifier.lineno
(int): line numbers upto.
Returns:
lines
(list): header.
function cat
cat(ps, outp)
Concatenate text files.
Args:
ps
(list): list of paths.outp
(str): output path.
Returns:
outp
(str): output path.
module roux.vizi.scatter
function plot_scatters_grouped
plot_scatters_grouped(
data: DataFrame,
cols_groupby: list,
aggfunc: dict,
orient='h',
**kws_encode
)
Scatters grouped by categories.
Args:
data
(pd.DataFrame): input data,cols_groupby
(list): list of colummns to groupby,aggfunc
(dict): columns mapped to the aggregation function,
Keyword Args:
kws_encode
: parameters provided to theencode
attribute
Returns: Altair figure
module roux.stat.network
For network related stats.
function get_subgraphs
get_subgraphs(df1: DataFrame, source: str, target: str) → DataFrame
Subgraphs from the the edge list.
Args:
df1
(pd.DataFrame): input dataframe containing edge-list.source
(str): source node.target
(str): taget node.
Returns:
pd.DataFrame
: output.
module roux.lib.google
Processing files form google-cloud services.
function get_service
get_service(service_name='drive', access_limit=True, client_config=None)
Creates a google service object.
:param service_name: name of the service e.g. drive :param access_limit: True is access limited else False :param client_config: custom client config ... :return: google service object
Ref: https://developers.google.com/drive/api/v3/about-auth
function get_service
get_service(service_name='drive', access_limit=True, client_config=None)
Creates a google service object.
:param service_name: name of the service e.g. drive :param access_limit: True is access limited else False :param client_config: custom client config ... :return: google service object
Ref: https://developers.google.com/drive/api/v3/about-auth
function list_files_in_folder
list_files_in_folder(service, folderid, filetype=None, fileext=None, test=False)
Lists files in a google drive folder.
:param service: service object e.g. drive :param folderid: folder id from google drive :param filetype: specify file type :param fileext: specify file extension :param test: True if verbose else False ... :return: list of files in the folder
function get_file_id
get_file_id(p)
function download_file
download_file(
p=None,
file_id=None,
service=None,
outd=None,
outp=None,
convert=False,
force=False,
test=False
)
Downloads a specified file.
:param service: google service object :param file_id: file id as on google drive :param filetypes: specify file type :param outp: path to the ouput file :param test: True if verbose else False
Ref: https://developers.google.com/drive/api/v3/ref-export-formats
function upload_file
upload_file(service, filep, folder_id, test=False)
Uploads a local file onto google drive.
:param service: google service object :param filep: path of the file :param folder_id: id of the folder on google drive where the file will be uploaded :param test: True is verbose else False ... :return: id of the uploaded file
function upload_files
upload_files(service, ps, folder_id, **kws)
function download_drawings
download_drawings(folderid, outd, service=None, test=False)
Download specific files: drawings
TODOs: 1. use download_file
function get_comments
get_comments(
fileid,
fields='comments/quotedFileContent/value,comments/content,comments/id',
service=None
)
Get comments.
fields: comments/ kind: id: createdTime: modifiedTime: author: kind: displayName: photoLink: me: True htmlContent: content: deleted: quotedFileContent: mimeType: value: anchor: replies: []
function search
search(query, results=1, service=None, **kws_search)
Google search.
:param query: exact terms ... :return: dict
function get_search_strings
get_search_strings(text, num=5, test=False)
Google search.
:param text: string :param num: number of results :param test: True if verbose else False ... :return lines: list
function get_metadata_of_paper
get_metadata_of_paper(
file_id,
service_drive,
service_search,
metadata=None,
force=False,
test=False
)
Get the metadata of a pdf document.
function share
share(
drive_service,
content_id,
share=False,
unshare=False,
user_permission=None,
permissionId='anyoneWithLink'
)
:params user_permission: user_permission = { 'type': 'anyone', 'role': 'reader', 'email':'@' } Ref: https://developers.google.com/drive/api/v3/manage-sharing
class slides
method create_image
create_image(service, presentation_id, page_id, image_id)
image less than 1.5 Mb
method get_page_ids
get_page_ids(service, presentation_id)
module roux.viz
Global Variables
- ds
- theme
- ax_
- colors
- diagram
- io
module roux.viz.ax_
For setting up subplots.
function set_axes_minimal
set_axes_minimal(ax, xlabel=None, ylabel=None, off_axes_pad=0) → Axes
Set minimal axes labels, at the lower left corner.
function set_axes_arrows
set_axes_arrows(
ax: Axes,
length: float = 0.1,
pad: float = 0.2,
color: str = 'k',
head_width: float = 0.03,
head_length: float = 0.02,
length_includes_head: bool = True,
clip_on: bool = False,
**kws_arrow
)
Set arrows next to the axis labels.
Parameters:
ax
(plt.Axes): subplot. color=
function set_label
set_label(
s: str,
ax: Axes,
x: float = 0,
y: float = 0,
ha: str = 'left',
va: str = 'top',
loc=None,
off_loc=0.01,
title: bool = False,
**kws
) → Axes
Set label on a plot.
Args:
x
(float): x position.y
(float): y position.s
(str): label.ax
(plt.Axes):plt.Axes
object.ha
(str, optional): horizontal alignment. Defaults to 'left'.va
(str, optional): vertical alignment. Defaults to 'top'.loc
(int, optional): location of the label. 1:'upper right', 2:'upper left', 3:'lower left':3, 4:'lower right'offs_loc
(tuple,optional): x and y location offsets.title
(bool, optional): set as title. Defaults to False.
Returns:
plt.Axes
:plt.Axes
object.
function set_ylabel
set_ylabel(
ax: Axes,
s: str = None,
x: float = -0.1,
y: float = 1.02,
xoff: float = 0,
yoff: float = 0
) → Axes
Set ylabel horizontal.
Args:
ax
(plt.Axes):plt.Axes
object.s
(str, optional): ylabel. Defaults to None.x
(float, optional): x position. Defaults to -0.1.y
(float, optional): y position. Defaults to 1.02.xoff
(float, optional): x offset. Defaults to 0.yoff
(float, optional): y offset. Defaults to 0.
Returns:
plt.Axes
:plt.Axes
object.
function get_ax_labels
get_ax_labels(ax: Axes)
function format_labels
format_labels(
ax,
axes: list = ['x', 'y'],
fmt='cap1',
title_fontsize=15,
rename_labels=None,
rotate_ylabel=True,
y=1.05,
test=False
)
function rename_ticklabels
rename_ticklabels(
ax: Axes,
axis: str,
rename: dict = None,
replace: dict = None,
ignore: bool = False
) → Axes
Rename the ticklabels.
Args:
ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.axis
(str): axis (x|y).rename
(dict, optional): replace strings. Defaults to None.replace
(dict, optional): replace sub-strings. Defaults to None.ignore
(bool, optional): ignore warnings. Defaults to False.
Raises:
ValueError
: eitherrename
orreplace
should be provided.
Returns:
plt.Axes
:plt.Axes
object.
function get_ticklabel_position
get_ticklabel_position(ax: Axes, axis: str) → Axes
Get positions of the ticklabels.
Args:
ax
(plt.Axes):plt.Axes
object.axis
(str): axis (x|y).
Returns:
plt.Axes
:plt.Axes
object.
function set_ticklabels_color
set_ticklabels_color(ax: Axes, ticklabel2color: dict, axis: str = 'y') → Axes
Set colors to ticklabels.
Args:
ax
(plt.Axes):plt.Axes
object.ticklabel2color
(dict): colors of the ticklabels.axis
(str): axis (x|y).
Returns:
plt.Axes
:plt.Axes
object.
function format_ticklabels
format_ticklabels(
ax: Axes,
axes: tuple = ['x', 'y'],
interval: float = None,
n: int = None,
fmt: str = None,
font: str = None
) → Axes
format_ticklabels
Args:
ax
(plt.Axes):plt.Axes
object.axes
(tuple, optional): axes. Defaults to ['x','y'].n
(int, optional): number of ticks. Defaults to None.fmt
(str, optional): format e.g. '.0f'. Defaults to None.font
(str, optional): font. Defaults to 'DejaVu Sans Mono'.
Returns:
plt.Axes
:plt.Axes
object.
TODOs: 1. include color_ticklabels
function split_ticklabels
split_ticklabels(
ax: Axes,
fmt: str,
axis='x',
group_x=-0.45,
group_y=-0.25,
group_prefix=None,
group_suffix=False,
group_loc='center',
group_colors=None,
group_alpha=0.2,
show_group_line=True,
group_line_off_x=0.15,
group_line_off_y=0.1,
show_group_span=False,
group_span_kws={},
sep: str = '-',
pad_major=6,
off: float = 0.2,
test: bool = False,
**kws
) → Axes
Split ticklabels into major and minor. Two minor ticks are created per major tick.
Args:
ax
(plt.Axes):plt.Axes
object.fmt
(str): 'group'-wise or 'pair'-wise splitting of the ticklabels.axis
(str): name of the axis: x or y.sep
(str, optional): separator within the tick labels. Defaults to ' '.test
(bool, optional): test-mode. Defaults to False.
Returns:
plt.Axes
:plt.Axes
object.
function get_axlimsby_data
get_axlimsby_data(
X: Series,
Y: Series,
off: float = 0.2,
equal: bool = False
) → Axes
Infer axis limits from data.
Args:
X
(pd.Series): x values.Y
(pd.Series): y values.off
(float, optional): offsets. Defaults to 0.2.equal
(bool, optional): equal limits. Defaults to False.
Returns:
plt.Axes
:plt.Axes
object.
function get_axlims
get_axlims(ax: Axes) → Axes
Get axis limits.
Args:
ax
(plt.Axes):plt.Axes
object.
Returns:
plt.Axes
:plt.Axes
object.
function set_equallim
set_equallim(
ax: Axes,
diagonal: bool = False,
difference: float = None,
format_ticks: bool = True,
**kws_format_ticklabels
) → Axes
Set equal axis limits.
Args:
ax
(plt.Axes):plt.Axes
object.diagonal
(bool, optional): show diagonal. Defaults to False.difference
(float, optional): difference from . Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
function set_axlims
set_axlims(
ax: Axes,
off: float,
axes: list = ['x', 'y'],
equal=False,
**kws_set_equallim
) → Axes
Set axis limits.
Args:
ax
(plt.Axes):plt.Axes
object.off
(float): offset.axes
(list, optional): axis name/s. Defaults to ['x','y'].
Returns:
plt.Axes
:plt.Axes
object.
function set_grids
set_grids(ax: Axes, axis: str = None) → Axes
Show grids based on the shape (aspect ratio) of the plot.
Args:
ax
(plt.Axes):plt.Axes
object.axis
(str, optional): axis name. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
function format_legends
format_legends(ax: Axes, **kws_legend) → Axes
Format legend text.
Args:
ax
(plt.Axes):plt.Axes
object.
Returns:
plt.Axes
:plt.Axes
object.
function rename_legends
rename_legends(ax: Axes, replaces: dict, **kws_legend) → Axes
Rename legends.
Args:
ax
(plt.Axes):plt.Axes
object.replaces
(dict): description
Returns:
plt.Axes
:plt.Axes
object.
function append_legends
append_legends(ax: Axes, labels: list, handles: list, **kws) → Axes
Append to legends.
Args:
ax
(plt.Axes):plt.Axes
object.labels
(list): labels.handles
(list): handles.
Returns:
plt.Axes
:plt.Axes
object.
function sort_legends
sort_legends(ax: Axes, sort_order: list = None, **kws) → Axes
Sort or filter legends.
Args:
ax
(plt.Axes):plt.Axes
object.sort_order
(list, optional): order of legends. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
Notes:
- Filter the legends by providing the indices of the legends to keep.
function drop_duplicate_legend
drop_duplicate_legend(ax, **kws)
function reset_legend_colors
reset_legend_colors(ax)
Reset legend colors.
Args:
ax
(plt.Axes):plt.Axes
object.
Returns:
plt.Axes
:plt.Axes
object.
function set_legends_merged
set_legends_merged(axs, **kws_legend)
Reset legend colors.
Args:
axs
(list): list ofplt.Axes
objects.
Returns:
plt.Axes
: firstplt.Axes
object in the list.
function set_legend_custom
set_legend_custom(
ax: Axes,
legend2param: dict,
param: str = 'color',
lw: float = 1,
marker: str = 'o',
markerfacecolor: bool = True,
size: float = 10,
color: str = 'k',
linestyle: str = '',
title_ha: str = 'center',
**kws
) → Axes
Set custom legends.
Args:
ax
(plt.Axes):plt.Axes
object.legend2param
(dict): legend name to parameter to change e.g. name of the color.param
(str, optional): parameter to change. Defaults to 'color'.lw
(float, optional): line width. Defaults to 1.marker
(str, optional): marker type. Defaults to 'o'.markerfacecolor
(bool, optional): marker face color. Defaults to True.size
(float, optional): size of the markers. Defaults to 10.color
(str, optional): color of the markers. Defaults to 'k'.linestyle
(str, optional): line style. Defaults to ''.title_ha
(str, optional): title horizontal alignment. Defaults to 'center'.frameon
(bool, optional): show frame. Defaults to True.
Returns:
plt.Axes
:plt.Axes
object.
TODOs: 1. differnet number of points for eachh entry
from matplotlib.legend_handler import HandlerTuple l1, = plt.plot(-1, -1, lw=0, marker="o", markerfacecolor='k', markeredgecolor='k') l2, = plt.plot(-0.5, -1, lw=0, marker="o", markerfacecolor="none", markeredgecolor='k') plt.legend([(l1,), (l1, l2)], ["test 1", "test 2"],
handler_map={tuple
: HandlerTuple(2)} )
References:
https
: //matplotlib.org/stable/api/markers_api.htmlhttp
: //www.cis.jhu.edu/~shanest/mpt/js/mathjax/mathjax-dev/fonts/Tables/STIX/STIX/All/All.html
function get_line_cap_length
get_line_cap_length(ax: Axes, linewidth: float) → Axes
Get the line cap length.
Args:
ax
(plt.Axes):plt.Axes
objectlinewidth
(float): width of the line.
Returns:
plt.Axes
:plt.Axes
object
function set_colorbar
set_colorbar(
fig: object,
ax: Axes,
ax_pc: Axes,
label: str,
bbox_to_anchor: tuple = (0.05, 0.5, 1, 0.45),
orientation: str = 'vertical'
)
Set colorbar.
Args:
fig
(object): figure object.ax
(plt.Axes):plt.Axes
object.ax_pc
(plt.Axes):plt.Axes
object for the colorbar.label
(str): labelbbox_to_anchor
(tuple, optional): location. Defaults to (0.05, 0.5, 1, 0.45).orientation
(str, optional): orientation. Defaults to "vertical".
Returns: figure object.
function set_colorbar_label
set_colorbar_label(ax: Axes, label: str) → Axes
Find colorbar and set label for it.
Args:
ax
(plt.Axes):plt.Axes
object.label
(str): label.
Returns:
plt.Axes
:plt.Axes
object.
function format_ax
format_ax(
ax=None,
kws_fmt_ticklabels={},
kws_fmt_labels={},
kws_legend={},
rotate_ylabel=False
)
module roux.viz.io
For input/output of plots.
function to_plotp
to_plotp(
ax: Axes = None,
prefix: str = 'plot/plot_',
suffix: str = '',
fmts: list = ['png']
) → str
Infer output path for a plot.
Args:
ax
(plt.Axes):plt.Axes
object.prefix
(str, optional): prefix with directory path for the plot. Defaults to 'plot/plot_'.suffix
(str, optional): suffix of the filename. Defaults to ''.fmts
(list, optional): formats of the images. Defaults to ['png'].
Returns:
str
: output path for the plot.
function savefig
savefig(
plotp: str,
tight_layout: bool = True,
bbox_inches: list = None,
fmts: list = ['png'],
savepdf: bool = False,
normalise_path: bool = True,
replaces_plotp: dict = None,
dpi: int = 500,
force: bool = True,
kws_replace_many: dict = {},
kws_savefig: dict = {},
verbose: bool = False,
**kws
) → str
Wrapper around plt.savefig
.
Args:
plotp
(str): output path orplt.Axes
object.tight_layout
(bool, optional): tight_layout. Defaults to True.bbox_inches
(list, optional): bbox_inches. Defaults to None.savepdf
(bool, optional): savepdf. Defaults to False.normalise_path
(bool, optional): normalise_path. Defaults to True.replaces_plotp
(dict, optional): replaces_plotp. Defaults to None.dpi
(int, optional): dpi. Defaults to 500.force
(bool, optional): overwrite output. Defaults to True.kws_replace_many
(dict, optional): parameters provided to thereplace_many
function. Defaults to {}.
Keyword Args:
kws
: parameters provided toto_plotp
function.kws_savefig
: parameters provided toto_savefig
function.kws_replace_many
: parameters provided toreplace_many
function.
Returns:
str
: output path.
function savelegend
savelegend(
plotp: str,
legend: object,
expand: list = [-5, -5, 5, 5],
**kws_savefig
) → str
Save only the legend of the plot/figure.
Args:
plotp
(str): output path.legend
(object): legend object.expand
(list, optional): expand. Defaults to [-5,-5,5,5].
Returns:
str
: output path.
References:
1. https
: //stackoverflow.com/a/47749903/3521099
function update_kws_plot
update_kws_plot(kws_plot: dict, kws_plotp: dict, test: bool = False) → dict
Update the input parameters.
Args:
kws_plot
(dict): input parameters.kws_plotp
(dict): saved parameters.test
(bool, optional): description. Defaults to False.
Returns:
dict
: updated parameters.
function get_plot_inputs
get_plot_inputs(
plotp: str,
df1: DataFrame = None,
kws_plot: dict = {},
outd: str = None
) → tuple
Get plot inputs.
Args:
plotp
(str): path of the plot.df1
(pd.DataFrame): data for the plot.kws_plot
(dict): parameters of the plot.outd
(str): output directory.
Returns:
tuple
: (path,dataframe,dict)
function log_code
log_code()
Log the code.
function log_code
log_code()
Log the code.
function get_lines
get_lines(
logp: str = 'log_notebook.log',
sep: str = 'begin_plot()',
test: bool = False
) → list
Get lines from the log.
Args:
logp
(str, optional): path to the log file. Defaults to 'log_notebook.log'.sep
(str, optional): label marking the start of code of the plot. Defaults to 'begin_plot()'.test
(bool, optional): test mode. Defaults to False.
Returns:
list
: lines of code.
function to_script
to_script(
srcp: str,
plotp: str,
defn: str = 'plot_',
s4: str = ' ',
test: bool = False,
validate: bool = False,
**kws
) → str
Save the script with the code for the plot.
Args:
srcp
(str): path of the script.plotp
(str): path of the plot.defn
(str, optional): prefix of the function. Defaults to "plot_".s4
(str, optional): a tab. Defaults to ' '.test
(bool, optional): test mode. Defaults to False.
Returns:
str
: path of the script.
TODOs: 1. Compatible with names of the input dataframes other that df1
. 1. Get the variable name of the dataframe
def get_df_name(df): name =[x for x in globals() if globals()[x] is df and not x.startswith('-')][0] return name
- Replace
df1
with the variable name of the dataframe.
function to_plot
to_plot(
plotp: str,
data: DataFrame = None,
df1: DataFrame = None,
kws_plot: dict = {},
logp: str = 'log_notebook.log',
sep: str = 'begin_plot()',
validate: bool = False,
show_path: bool = False,
show_path_offy: float = -0.2,
force: bool = True,
test: bool = False,
quiet: bool = True,
**kws
) → str
Save a plot.
Args:
plotp
(str): output path.df1
(pd.DataFrame, optional): dataframe with plotting data. Defaults to None.data
(pd.DataFrame, optional): dataframe with plotting data. Defaults to None.kws_plot
(dict, optional): parameters for plotting. Defaults to dict().logp
(str, optional): path to the log. Defaults to 'log_notebook.log'.sep
(str, optional): separator marking the start of the plotting code in jupyter notebook. Defaults to 'begin_plot()'.validate
(bool, optional): validate the "readability" usingread_plot
function. Defaults to False.show_path
(bool, optional): show path on the plot. Defaults to False.show_path_offy
(float, optional): y-offset for the path label. Defaults to 0.force
(bool, optional): overwrite output. Defaults to True.test
(bool, optional): test mode. Defaults to False.quiet
(bool, optional): quiet mode. Defaults to False.
Returns:
str
: output path.
Notes:
Requirement: 1. Start logging in the jupyter notebook. from IPython import get_ipython log_notebookp=f'log_notebook.log';open(log_notebookp, 'w').close();get_ipython().run_line_magic('logstart','{log_notebookp} over')
function read_plot
read_plot(p: str, safe: bool = False, test: bool = False, **kws) → Axes
Generate the plot from data, parameters and a script.
Args:
p
(str): path of the plot saved usingto_plot
function.safe
(bool, optional): read as an image. Defaults to False.test
(bool, optional): test mode. Defaults to False.
Returns:
plt.Axes
:plt.Axes
object.
function to_concat
to_concat(
ps: list,
how: str = 'h',
use_imagemagick: bool = False,
use_conda_env: bool = False,
test: bool = False,
**kws_outp
) → str
Concat images.
Args:
ps
(list): list of paths.how
(str, optional): horizontal (h
) or verticalv
. Defaults to 'h'.test
(bool, optional): test mode. Defaults to False.
Returns:
str
: path of the output.
function to_montage
to_montage(
ps: list,
layout: str,
source_path: str = None,
env_name: str = None,
hspace: float = 0,
vspace: float = 0,
output_path: str = None,
test: bool = False,
**kws_outp
) → str
To montage.
Args:
ps
(type): list of paths.layout
(type): layout of the images.hspace
(int, optional): horizontal space. Defaults to 0.vspace
(int, optional): vertical space. Defaults to 0.test
(bool, optional): test mode. Defaults to False.
Returns:
str
: path of the output.
function to_gif
to_gif(
ps: list,
outp: str,
duration: int = 200,
loop: int = 0,
optimize: bool = True
) → str
Convert to GIF.
Args:
ps
(list): list of paths.outp
(str): output path.duration
(int, optional): duration. Defaults to 200.loop
(int, optional): loop or not. Defaults to 0.optimize
(bool, optional): optimize the size. Defaults to True.
Returns:
str
: output path.
References:
1. https
: //pillow.readthedocs.io/en/stable/handbook/image-file-formats.html#gif2. https
: //stackoverflow.com/a/57751793/3521099
function to_data
to_data(path: str) → str
Convert to base64 string.
Args:
path
(str): path of the input.
Returns: base64 string.
function to_convert
to_convert(filep: str, outd: str = None, fmt: str = 'JPEG') → str
Convert format of image using PIL
.
Args:
filep
(str): input path.outd
(str, optional): output directory. Defaults to None.fmt
(str, optional): format of the output. Defaults to "JPEG".
Returns:
str
: output path.
function to_raster
to_raster(
plotp: str,
dpi: int = 500,
alpha: bool = False,
trim: bool = False,
force: bool = False,
test: bool = False
) → str
to_raster summary
Args:
plotp
(str): input path.dpi
(int, optional): DPI. Defaults to 500.alpha
(bool, optional): transparency. Defaults to False.trim
(bool, optional): trim margins. Defaults to False.force
(bool, optional): overwrite output. Defaults to False.test
(bool, optional): test mode. Defaults to False.
Returns:
str
: description
Notes:
- Runs a bash command:
convert -density 300 -trim
.
function to_rasters
to_rasters(plotd, ext='svg')
Convert many images to raster. Uses inkscape.
Args:
plotd
(str): directory.ext
(str, optional): extension of the output. Defaults to 'svg'.
module roux.stat.corr
For correlation stats.
function resampled
resampled(
x: <built-in function array>,
y: <built-in function array>,
method_fun: object,
method_kws: dict = {},
ci_type: str = 'max',
cv: int = 5,
random_state: int = 1,
verbose: bool = False
) → tuple
Get correlations after resampling.
Args:
x
(np.array): x vector.y
(np.array): y vector.method_fun
(str, optional): method function.ci_type
(str, optional): confidence interval type. Defaults to 'max'.cv
(int, optional): number of resamples. Defaults to 5.random_state
(int, optional): random state. Defaults to 1.verbose
(bool): verbose.
Returns:
dict
: results containing mean correlation coefficient, CI and CI type.
function get_corr
get_corr(
x: str,
y: str,
method: str,
df: DataFrame = None,
method_kws: dict = {},
pval: bool = True,
preprocess: bool = True,
n_min=10,
preprocess_kws: dict = {},
resample: bool = False,
cv=5,
resample_kws: dict = {},
verbose: bool = False,
test: bool = False
) → dict
Correlation between vectors. A unifying wrapper around scipy
's functions to calculate correlations and distances. Allows application of resampling on those functions.
Usage: 1. Linear table with paired values. For a matrix, use pd.DataFrame.corr
instead.
Args:
x
(str): x column name or a vector.y
(str): y column name or a vector.method
(str): method name.df
(pd.DataFrame): input table.pval
(bool): calculate p-value.resample
(bool, optional): resampling. Defaults to False.preprocess
(bool): preprocess the inputpreprocess_kws (dict)
: parameters provided to the pre-processing function i.e._pre
.resample
(bool): resampling.resample_kws
(dict): parameters provided to the resampling function i.e.resample
.verbose
(bool): verbose.
Returns:
res
(dict): a dictionary containing results.
Notes:
res
directory contains following values: method : method name r : correlation coefficient or distance p : pvalue of the correlation. n : sample size rr: resampled average 'r' ci: CI ci_type: CI type
function get_corrs
get_corrs(
data: DataFrame,
method: str,
cols: list = None,
cols_with: list = None,
coff_inflation_min: float = None,
get_pairs_kws={},
fast: bool = False,
test: bool = False,
verbose: bool = False,
**kws_get_corr
) → DataFrame
Correlate many columns of a dataframes.
Parameters:
df1
(DataFrame): input dataframe.method
(str): method of correlationspearman
orpearson
.cols
(str): columns.cols_with
(str): columns to correlate with i.e. variable2.fast
(bool): use parallel-processing if True.
Keyword arguments:
kws_get_corr
: parameters provided toget_corr
function.
Returns:
DataFrame
: output dataframe.
Notes:
In the fast mode (fast=True), to set the number of processes, before executing the
get_corrs
command, run from pandarallel import pandarallel pandarallel.initialize(nb_workers={},progress_bar=True,use_memory_fs=False)
function check_collinearity
check_collinearity(
df1: DataFrame,
threshold: float = 0.7,
colvalue: str = 'r',
cols_variable: list = ['variable1', 'variable2'],
coff_pval: float = 0.05,
method: str = 'spearman',
coff_inflation_min: int = 50
) → Series
Check collinearity.
Args:
df1
(DataFrame): input dataframe.threshold
(float): minimum threshold for the colinearity.
Returns:
DataFrame
: output dataframe with minimum correlation among correlated subnetwork of columns.
function pairwise_chi2
pairwise_chi2(df1: DataFrame, cols_values: list) → DataFrame
Pairwise chi2 test.
Args:
df1
(DataFrame): pd.DataFramecols_values
(list): list of columns.
Returns:
DataFrame
: output dataframe.
TODOs: 0. use lib.set.get_pairs
to get the combinations.
module roux.viz.line
For line plots.
function plot_range
plot_range(
df00: DataFrame,
colvalue: str,
colindex: str,
k: str,
headsize: int = 15,
headcolor: str = 'lightgray',
ax: Axes = None,
**kws_area
) → Axes
Plot range/intervals e.g. genome coordinates as lines.
Args:
df00
(pd.DataFrame): input data.colvalue
(str): column with values.colindex
(str): column with ids.k
(str): subset name.headsize
(int, optional): margin at top. Defaults to 15.headcolor
(str, optional): color of the margin. Defaults to 'lightgray'.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword args:
kws
: keyword parameters provided toarea
function.
Returns:
plt.Axes
:plt.Axes
object.
function plot_bezier
plot_bezier(
pt1,
pt2,
pt1_guide=None,
pt2_guide=None,
direction='h',
off_guide=0.25,
ax=None,
test=False,
**kws_line
)
function plot_kinetics
plot_kinetics(
df1: DataFrame,
x: str,
y: str,
hue: str,
cmap: str = 'Reds_r',
ax: Axes = None,
test: bool = False,
kws_legend: dict = {},
**kws_set
) → Axes
Plot time-dependent kinetic data.
Args:
df1
(pd.DataFrame): input data.x
(str): x column.y
(str): y column.hue
(str): hue column.cmap
(str, optional): colormap. Defaults to 'Reds_r'.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.test
(bool, optional): test mode. Defaults to False.kws_legend
(dict, optional): legend parameters. Defaults to {}.
Returns:
plt.Axes
:plt.Axes
object.
function plot_steps
plot_steps(
df1: DataFrame,
col_step_name: str,
col_step_size: str,
ax: Axes = None,
test: bool = False
) → Axes
Plot step-wise changes in numbers, e.g. for a filtering process.
Args:
df1
(pd.DataFrame): input data.col_step_size
(str): column containing the numbers.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.test
(bool, optional): test mode. Defaults to False.
Returns:
plt.Axes
:plt.Axes
object.
module roux.lib.df
For processing individual pandas DataFrames/Series. Mainly used in piped operations.
function get_name
get_name(df1: DataFrame, cols: list = None, coff: float = 2, out=None)
Gets the name of the dataframe.
Especially useful within groupby
+pandarellel
context.
Parameters:
df1
(DataFrame): input dataframe.cols
(list): list groupby columns.coff
(int): cutoff of unique values to infer the name.out
(str): format of the output (list|not).
Returns:
name
(tuple|str|list): name of the dataframe.
function log_name
log_name(df1: DataFrame, **kws_get_name)
function get_groupby_columns
get_groupby_columns(df_)
Get the columns supplied to groupby
.
Parameters:
df_
(DataFrame): input dataframe.
Returns:
columns
(list): list of columns.
function get_constants
get_constants(df1)
Get the columns with a single unique value.
Parameters:
df1
(DataFrame): input dataframe.
Returns:
columns
(list): list of columns.
function drop_unnamedcol
drop_unnamedcol(df)
Deletes the columns with "Unnamed" prefix.
Parameters:
df
(DataFrame): input dataframe.
Returns:
df
(DataFrame): output dataframe.
function drop_unnamedcol
drop_unnamedcol(df)
Deletes the columns with "Unnamed" prefix.
Parameters:
df
(DataFrame): input dataframe.
Returns:
df
(DataFrame): output dataframe.
function drop_levelcol
drop_levelcol(df)
Deletes the potentially temporary columns names with "level" prefix.
Parameters:
df
(DataFrame): input dataframe.
Returns:
df
(DataFrame): output dataframe.
function drop_constants
drop_constants(df)
Deletes columns with a single unique value.
Parameters:
df
(DataFrame): input dataframe.
Returns:
df
(DataFrame): output dataframe.
function dropby_patterns
dropby_patterns(
df1,
patterns=None,
strict=False,
test=False,
verbose=True,
errors='raise'
)
Deletes columns containing substrings i.e. patterns.
Parameters:
df1
(DataFrame): input dataframe.patterns
(list): list of substrings.test
(bool): verbose.
Returns:
df1
(DataFrame): output dataframe.
function flatten_columns
flatten_columns(df: DataFrame, sep: str = ' ', **kws) → DataFrame
Multi-index columns to single-level.
Parameters:
df
(DataFrame): input dataframe.sep
(str): separator within the joined tuples (' ').
Returns:
df
(DataFrame): output dataframe.
Keyword Arguments:
kws
(dict): parameters provided tocoltuples2str
function.
function lower_columns
lower_columns(df)
Column names of the dataframe to lower-case letters.
Parameters:
df
(DataFrame): input dataframe.
Returns:
df
(DataFrame): output dataframe.
function renameby_replace
renameby_replace(
df: DataFrame,
replaces: dict,
ignore: bool = True,
**kws
) → DataFrame
Rename columns by replacing sub-strings.
Parameters:
df
(DataFrame): input dataframe.replaces
(dict|list): from->to format or list containing substrings to remove.ignore
(bool): if True, not validate the successful replacements.
Returns:
df
(DataFrame): output dataframe.
Keyword Arguments:
kws
(dict): parameters provided toreplacemany
function.
function clean_columns
clean_columns(df: DataFrame) → DataFrame
Standardise columns.
Steps: 1. Strip flanking white-spaces. 2. Lower-case letters.
Parameters:
df
(DataFrame): input dataframe.
Returns:
df
(DataFrame): output dataframe.
function clean
clean(
df: DataFrame,
cols: list = [],
drop_constants: bool = False,
drop_unnamed: bool = True,
verb: bool = False
) → DataFrame
Deletes potentially temporary columns.
Steps: 1. Strip flanking white-spaces. 2. Lower-case letters.
Parameters:
df
(DataFrame): input dataframe.drop_constants
(bool): whether to delete the columns with a single unique value.drop_unnamed
(bool): whether to delete the columns with 'Unnamed' prefix.verb
(bool): verbose.
Returns:
df
(DataFrame): output dataframe.
function compress
compress(df1: DataFrame, coff_categories: int = None, verbose: bool = True)
Compress the dataframe by converting columns containing strings/objects to categorical.
Parameters:
df1
(DataFrame): input dataframe.coff_categories
(int): if the number of unique values are less than cutoff the it will be converted to categories.verbose
(bool): verbose.
Returns:
df1
(DataFrame): output dataframe.
function clean_compress
clean_compress(df: DataFrame, kws_compress: dict = {}, **kws_clean)
clean
and compress
the dataframe.
Parameters:
df
(DataFrame): input dataframe.kws_compress
(int): keyword arguments for thecompress
function.test
(bool): verbose.
Keyword Arguments:
kws_clean
(dict): parameters provided toclean
function.
Returns:
df1
(DataFrame): output dataframe.
See Also: clean
compress
function check_na
check_na(df, subset=None, out=True, perc=False, log=True)
Number of missing values in columns.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.out
(bool): output, else not which can be applicable in chained operations.
Returns:
ds
(Series): output stats.
function validate_no_na
validate_no_na(df, subset=None)
Validate no missing values in columns.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.perc
(bool): output percentages.
Returns:
ds
(Series): output stats.
function assert_no_na
assert_no_na(df, subset=None)
Assert that no missing values in columns.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.perc
(bool): output percentages.
Returns:
ds
(Series): output stats.
function to_str
to_str(data, log=False)
function check_nunique
check_nunique(
df: DataFrame,
subset: list = None,
groupby: str = None,
perc: bool = False,
auto=False,
out=True,
log=True
) → Series
Number/percentage of unique values in columns.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.perc
(bool): output percentages.
Returns:
ds
(Series): output stats.
function check_inflation
check_inflation(df1, subset=None)
Occurances of values in columns.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.
Returns:
ds
(Series): output stats.
function check_dups
check_dups(df, subset=None, perc=False, out=True)
Check duplicates.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.perc
(bool): output percentages.
Returns:
ds
(Series): output stats.
function check_duplicated
check_duplicated(df, **kws)
Check duplicates (alias of check_dups
)
function validate_no_dups
validate_no_dups(df, subset=None, log: bool = True)
Validate that no duplicates.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.
function validate_no_duplicates
validate_no_duplicates(df, subset=None, **kws)
Validate that no duplicates (alias of validate_no_dups
)
function assert_no_dups
assert_no_dups(df, subset=None)
Assert that no duplicates
function validate_dense
validate_dense(
df01: DataFrame,
subset: list = None,
duplicates: bool = True,
na: bool = True,
message=None
) → DataFrame
Validate no missing values and no duplicates in the dataframe.
Parameters:
df01
(DataFrame): input dataframe.subset
(list): list of columns.duplicates
(bool): whether to check duplicates.na
(bool): whether to check na.message
(str): error message
function assert_dense
assert_dense(
df01: DataFrame,
subset: list = None,
duplicates: bool = True,
na: bool = True,
message=None
) → DataFrame
Alias of validate_dense
.
Notes:
to be deprecated in future releases.
function assert_len
assert_len(df: DataFrame, count: int) → DataFrame
Validate length in pipe'd operations.
Example: ( df .rd.assert_len(10) )
function assert_nunique
assert_nunique(df: DataFrame, col: str, count: int) → DataFrame
Validate unique counts in pipe'd operations.
Example: ( df .rd.assert_nunique('id',10) )
function classify_mappings
classify_mappings(df1: DataFrame, subset, clean: bool = False) → DataFrame
Classify mappings between items in two columns.
Parameters:
df1
(DataFrame): input dataframe.col1
(str): column #1.col2
(str): column #2.clean
(str): drop columns with the counts.
Returns:
(pd.DataFrame)
: output.
function check_mappings
check_mappings(df: DataFrame, subset: list = None, out=True) → DataFrame
Mapping between items in two columns.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.out
(str): format of the output.
Returns:
ds
(Series): output stats.
function assert_1_1_mappings
assert_1_1_mappings(df: DataFrame, subset: list = None) → DataFrame
Validate that the papping between items in two columns is 1:1.
Parameters:
df
(DataFrame): input dataframe.subset
(list): list of columns.out
(str): format of the output.
function get_mappings
get_mappings(
df1: DataFrame,
subset=None,
keep='all',
clean=False,
cols=None
) → DataFrame
Classify the mapapping between items in two columns.
Parameters:
df1
(DataFrame): input dataframe.subset
(list): list of columns.keep
(str): type of mapping (1:1|1:m|m:1).clean
(bool): whether remove temporary columns.cols
(list): alias ofsubset
.
Returns:
df
(DataFrame): output dataframe.
function to_map_binary
to_map_binary(df: DataFrame, colgroupby=None, colvalue=None) → DataFrame
Convert linear mappings to a binary map
Parameters:
df
(DataFrame): input dataframe.colgroupby
(str): name of the column for groupby.colvalue
(str): name of the column containing values.
Returns:
df1
(DataFrame): output dataframe.
function check_intersections
check_intersections(
df: DataFrame,
colindex=None,
colgroupby=None,
plot=False,
**kws_plot
) → DataFrame
Check intersections. Linear dataframe to is converted to a binary map and then to a series using groupby
.
Parameters:
df
(DataFrame): input dataframe.colindex
(str): name of the index column.colgroupby
(str): name of the groupby column.plot
(bool): plot or not.
Returns:
ds1
(Series): output Series.
Keyword Arguments:
kws_plot
(dict): parameters provided to the plotting function.
function get_totals
get_totals(ds1)
Get totals from the output of check_intersections
.
Parameters:
ds1
(Series): input Series.
Returns:
d
(dict): output dictionary.
function filter_rows
filter_rows(
df,
d,
sign='==',
logic='and',
drop_constants=False,
test=False,
verbose=True
)
Filter rows using a dictionary.
Parameters:
df
(DataFrame): input dataframe.d
(dict): dictionary.sign
(str): condition within mappings ('==').logic
(str): condition between mappings ('and').drop_constants
(bool): to drop the columns with single unique value (False).test
(bool): testing (False).verbose
(bool): more verbose (True).
Returns:
df
(DataFrame): output dataframe.
function agg_bools
agg_bools(df1, cols)
Bools to columns. Reverse of one-hot encoder (get_dummies
).
Parameters:
df1
(DataFrame): input dataframe.cols
(list): columns.
Returns:
ds
(Series): output series.
function melt_paired
melt_paired(
df: DataFrame,
cols_index: list = None,
suffixes: list = None,
cols_value: list = None,
clean: bool = False
) → DataFrame
Melt a paired dataframe.
Parameters:
df
(DataFrame): input dataframe.cols_index
(list): paired index columns (None).suffixes
(list): paired suffixes (None).cols_value
(list): names of the columns containing the values (None).
Notes:
Partial melt melts selected columns
cols_value
.
Examples: Paired parameters: cols_value=['value1','value2'], suffixes=['gene1','gene2'],
function get_bin_labels
get_bin_labels(bins: list, dtype: str = 'int')
function get_bins
get_bins(
df: DataFrame,
col: str,
bins: list,
dtype: str = 'int',
labels: list = None,
**kws_cut
)
function get_qbins
get_qbins(df: DataFrame, col: str, bins: list, labels: list = None, **kws_qcut)
function get_chunks
get_chunks(
df1: DataFrame,
colindex: str,
colvalue: str,
bins: int = None,
value: str = 'right'
) → DataFrame
Get chunks of a dataframe.
Parameters:
df1
(DataFrame): input dataframe.colindex
(str): name of the index column.colvalue
(str): name of the column containing values [0-100]bins
(int): number of bins.value
(str): value to use as the name of the chunk ('right').
Returns:
ds
(Series): output series.
function sample_near_quantiles
sample_near_quantiles(data: DataFrame, col: str, n: int, clean: bool = False)
Get rows with values closest to the quantiles.
function get_group
get_group(groups, i: int = None, verbose: bool = True) → DataFrame
Get a dataframe for a group out of the groupby
object.
Parameters:
groups
(object): groupby object.i
(int): index of the group. default None returns the largest group.verbose
(bool): verbose (True).
Returns:
df
(DataFrame): output dataframe.
Notes:
Useful for testing
groupby
.
function groupby_sample
groupby_sample(
df: DataFrame,
groupby: list,
i: int = None,
**kws_get_group
) → DataFrame
Samples a group (similar to .sample)
Parameters:
df
(pd.DataFrame): input dataframe.groupby
(list): columns to group by.i
(int): index of the group. default None returns the largest group.
Keyword arguments: keyword parameters provided to the get_group
function
Returns: pd.DataFrame
function groupby_sort_values
groupby_sort_values(
df: DataFrame,
groupby: str,
col: str,
func: str,
col_temp: str = 'temp',
ascending=True,
**kws_sort_values
) → DataFrame
Groupby and sort
Parameters:
df
(pd.DataFrame): input dataframe.groupby
(list): columns to group by.
Keyword arguments: keyword parameters provided to the .sort_values
attribute
Returns: pd.DataFrame
function groupby_agg_nested
groupby_agg_nested(
df1: DataFrame,
groupby: list,
subset: list,
func: dict = None,
cols_value: list = None,
verbose: bool = False,
**kws_agg
) → DataFrame
Aggregate serially from the lower level subsets to upper level ones.
Parameters:
df1
(pd.DataFrame): input dataframe.groupby
(list): groupby columns i.e. list of columns to be used as ids in the output.subset
(list): nested groups i.e. subsets.func
(dict): map betweek columns with value to aggregate and the function for aggregation.cols_value
(list): columns with value to aggregate, (optional).verbose
(bool): verbose.
Keyword arguments:
kws_agg
: keyword arguments provided to pandas's.agg
function.
Returns: output dataframe with the aggregated values.
function groupby_filter_fast
groupby_filter_fast(
df1: DataFrame,
col_groupby,
fun_agg,
expr,
col_agg: str = 'temporary',
**kws_query
) → DataFrame
Groupby and filter fast.
Parameters:
df1
(DataFrame): input dataframe.by
(str|list): column name/s to groupby with.fun
(object): function to filter with.how
(str): greater or less thancoff
(>|<).coff
(float): cut-off.
Returns:
df1
(DataFrame): output dataframe.
Todo:
Deprecation if pandas.core.groupby.DataFrameGroupBy.filter
is faster.
function infer_index
infer_index(
data: DataFrame,
cols_drop=[],
include=<class 'object'>,
exclude=None
) → list
Infer the index (id) of the table.
function to_multiindex_columns
to_multiindex_columns(df, suffixes, test=False)
Single level columns to multiindex.
Parameters:
df
(DataFrame): input dataframe.suffixes
(list): list of suffixes.test
(bool): verbose (False).
Returns:
df
(DataFrame): output dataframe.
function to_ranges
to_ranges(df1, colindex, colbool, sort=True)
Ranges from boolean columns.
Parameters:
df1
(DataFrame): input dataframe.colindex
(str): column containing index items.colbool
(str): column containing boolean values.sort
(bool): sort the dataframe (True).
Returns:
df1
(DataFrame): output dataframe.
TODO: compare with io_sets.bools2intervals.
function to_boolean
to_boolean(df1)
Boolean from ranges.
Parameters:
df1
(DataFrame): input dataframe.
Returns:
ds
(Series): output series.
TODO: compare with io_sets.bools2intervals.
function to_cat
to_cat(ds1: Series, cats: list, ordered: bool = True)
To series containing categories.
Parameters:
ds1
(Series): input series.cats
(list): categories.ordered
(bool): if the categories are ordered (True).
Returns:
ds1
(Series): output series.
function astype_cat
astype_cat(df1: DataFrame, col: str, cats: list)
function sort_valuesby_list
sort_valuesby_list(
df1: DataFrame,
by: str,
cats: list,
by_more: list = [],
**kws
)
Sort dataframe by custom order of items in a column.
Parameters:
df1
(DataFrame): input dataframe.by
(str): column.cats
(list): ordered list of items.
Keyword parameters:
kws
(dict): parameters provided tosort_values
.
Returns:
df
(DataFrame): output dataframe.
function agg_by_order
agg_by_order(x, order)
Get first item in the order.
Parameters:
x
(list): list.order
(list): desired order of the items.
Returns:
k
: first item.
Notes:
Used for sorting strings. e.g.
damaging > other non-conserving > other conserving
TODO: Convert categories to numbers and take min
function agg_by_order_counts
agg_by_order_counts(x, order)
Get the aggregated counts by order*.
Parameters:
x
(list): list.order
(list): desired order of the items.
Returns:
df
(DataFrame): output dataframe.
Examples: df=pd.DataFrame({'a1':['a','b','c','a','b','c','d'], 'b1':['a1','a1','a1','b1','b1','b1','b1'],}) df.groupby('b1').apply(lambda df : agg_by_order_counts(x=df['a1'], order=['b','c','a'], ))
function swap_paired_cols
swap_paired_cols(df_, suffixes=['gene1', 'gene2'])
Swap suffixes of paired columns.
Parameters:
df_
(DataFrame): input dataframe.suffixes
(list): suffixes.
Returns:
df
(DataFrame): output dataframe.
function sort_columns_by_values
sort_columns_by_values(
df: DataFrame,
subset: list,
suffixes: list = None,
order: list = None,
clean=False
) → DataFrame
Sort the values in columns in ascending order.
Parameters:
df
(DataFrame): input dataframe.subset
(list): columns.suffixes
(list): suffixes.order
(list): ordered list.
Returns:
df
(DataFrame): output dataframe.
Notes:
In the output dataframe,
sorted
means values are sorted because gene1>gene2.
function make_ids
make_ids(
df: DataFrame,
cols: list,
ids_have_equal_length: bool,
sep: str = '--',
sort: bool = False
) → Series
Make ids by joining string ids in more than one columns.
Parameters:
df
(DataFrame): input dataframe.cols
(list): columns.ids_have_equal_length
(bool): ids have equal length, if True faster processing.sep
(str): separator between the ids ('--').sort
(bool): sort the ids before joining (False).
Returns:
ds
(Series): output series.
function make_ids_sorted
make_ids_sorted(
df: DataFrame,
cols: list,
ids_have_equal_length: bool,
sep: str = '--',
sort: bool = False
) → Series
Make sorted ids by joining string ids in more than one columns.
Parameters:
df
(DataFrame): input dataframe.cols
(list): columns.ids_have_equal_length
(bool): ids have equal length, if True faster processing.sep
(str): separator between the ids ('--').
Returns:
ds
(Series): output series.
function get_alt_id
get_alt_id(s1: str, s2: str, sep: str = '--')
Get alternate/partner id from a paired id.
Parameters:
s1
(str): joined id.s2
(str): query id.
Returns:
s
(str): partner id.
function split_ids
split_ids(df1, col, sep='--', prefix=None)
Split joined ids to individual ones.
Parameters:
df1
(DataFrame): input dataframe.col
(str): column containing the joined ids.sep
(str): separator within the joined ids ('--').prefix
(str): prefix of the individual ids (None).
Return:
df1
(DataFrame): output dataframe.
function dict2df
dict2df(d, colkey='key', colvalue='value')
Dictionary to DataFrame.
Parameters:
d
(dict): dictionary.colkey
(str): name of column containing the keys.colvalue
(str): name of column containing the values.
Returns:
df
(DataFrame): output dataframe.
function log_shape_change
log_shape_change(d1, fun='')
Report the changes in the shapes of a DataFrame.
Parameters:
d1
(dic): dictionary containing the shapes.fun
(str): name of the function.
function log_apply
log_apply(
df,
fun,
validate_equal_length=False,
validate_equal_width=False,
validate_equal_shape=False,
validate_no_decrease_length=False,
validate_no_decrease_width=False,
validate_no_increase_length=False,
validate_no_increase_width=False,
*args,
**kwargs
)
Report (log) the changes in the shapes of the dataframe before and after an operation/s.
Parameters:
df
(DataFrame): input dataframe.fun
(object): function to apply on the dataframe.validate_equal_length
(bool): Validate that the number of rows i.e. length of the dataframe remains the same before and after the operation.validate_equal_width
(bool): Validate that the number of columns i.e. width of the dataframe remains the same before and after the operation.validate_equal_shape
(bool): Validate that the number of rows and columns i.e. shape of the dataframe remains the same before and after the operation.
Keyword parameters:
args
(tuple): provided tofun
.kwargs
(dict): provided tofun
.
Returns:
df
(DataFrame): output dataframe.
class log
Report (log) the changes in the shapes of the dataframe before and after an operation/s.
TODO:
Create the attribures (attr
) using strings e.g. setattr. import inspect fun=inspect.currentframe().f_code.co_name
method __init__
__init__(pandas_obj)
method check_dups
check_dups(**kws)
method check_na
check_na(**kws)
method clean
clean(**kws)
method drop
drop(**kws)
method drop_duplicates
drop_duplicates(**kws)
method dropna
dropna(**kws)
method explode
explode(**kws)
method filter_
filter_(**kws)
method filter_rows
filter_rows(**kws)
method groupby
groupby(**kws)
method join
join(**kws)
method melt
melt(**kws)
method melt_paired
melt_paired(**kws)
method merge
merge(**kws)
method pivot
pivot(**kws)
method pivot_table
pivot_table(**kws)
method query
query(**kws)
method stack
stack(**kws)
method unstack
unstack(**kws)
module roux.stat.binary
For processing binary data.
function compare_bools_jaccard
compare_bools_jaccard(x, y)
Compare bools in terms of the jaccard index.
Args:
x
(list): list of bools.y
(list): list of bools.
Returns:
float
: jaccard index.
function compare_bools_jaccard_df
compare_bools_jaccard_df(df: DataFrame) → DataFrame
Pairwise compare bools in terms of the jaccard index.
Args:
df
(DataFrame): dataframe with boolean columns.
Returns:
DataFrame
: matrix with comparisons between the columns.
function classify_bools
classify_bools(l: list) → str
Classify bools.
Args:
l
(list): list of bools
Returns:
str
: classification.
function frac
frac(x: list) → float
Fraction.
Args:
x
(list): list of bools.
Returns:
float
: fraction of True values.
function perc
perc(x: list) → float
Percentage.
Args:
x
(list): list of bools.
Returns:
float
: Percentage of the True values
function get_stats_confusion_matrix
get_stats_confusion_matrix(df_: DataFrame) → DataFrame
Get stats confusion matrix.
Args:
df_
(DataFrame): Confusion matrix.
Returns:
DataFrame
: stats.
function get_cutoff
get_cutoff(
y_true,
y_score,
method,
show_diagonal=True,
show_area=True,
kws_area: dict = {},
show_cutoff=True,
plot_pr=True,
color='k',
returns=['ax'],
ax=None
)
Obtain threshold based on ROC or PR curve.
Returns: Table:
columns
: valuesmethod
: ROC, PRvariable
: threshold (index), TPR, FPR, TP counts, precision, recall values: Plots: AUC ROC, TPR vs TP counts PR Specificity vs TP counts Dictionary: Thresholds from AUC, PR
TODOs: 1. Separate the plotting functions.
module roux.viz.bar
For bar plots.
function plot_barh
plot_barh(
df1: DataFrame,
colx: str,
coly: str,
colannnotside: str = None,
x1: float = None,
offx: float = 0,
ax: Axes = None,
**kws
) → Axes
Plot horizontal bar plot with text on them.
Args:
df1
(pd.DataFrame): input data.colx
(str): x column.coly
(str): y column.colannnotside
(str): column with annotations to show on the right side of the plot.x1
(float): x position of the text.offx
(float): x-offset of x1, multiplier.color
(str): color of the bars.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Keyword Args:
kws
: parameters provided to thebarh
function.
Returns:
plt.Axes
:plt.Axes
object.
function plot_value_counts
plot_value_counts(
df: DataFrame,
col: str,
logx: bool = False,
kws_hist: dict = {'bins': 10},
kws_bar: dict = {},
grid: bool = False,
axes: list = None,
fig: object = None,
hist: bool = True
)
Plot pandas's value_counts
.
Args:
df
(pd.DataFrame): input datavalue_counts
.col
(str): column with counts.logx
(bool, optional): x-axis on log-scale. Defaults to False.kws_hist
(type, optional): parameters provided to thehist
function. Defaults to {'bins':10}.kws_bar
(dict, optional): parameters provided to thebar
function. Defaults to {}.grid
(bool, optional): show grids or not. Defaults to False.axes
(list, optional): list ofplt.axes
. Defaults to None.fig
(object, optional): figure object. Defaults to None.hist
(bool, optional): show histgram. Defaults to True.
function plot_barh_stacked_percentage
plot_barh_stacked_percentage(
df1: DataFrame,
coly: str,
colannot: str,
color: str = None,
yoff: float = 0,
ax: Axes = None
) → Axes
Plot horizontal stacked bar plot with percentages.
Args:
df1
(pd.DataFrame): input data. values in rows sum to 100%.coly
(str): y column. yticklabels, e.g. retained and dropped.colannot
(str): column with annotations.color
(str, optional): color. Defaults to None.yoff
(float, optional): y-offset. Defaults to 0.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
function plot_bar_serial
plot_bar_serial(
d1: dict,
polygon: bool = False,
polygon_x2i: float = 0,
labelis: list = [],
y: float = 0,
ylabel: str = None,
off_arrowy: float = 0.15,
kws_rectangle={'height': 0.5, 'linewidth': 1},
ax: Axes = None
) → Axes
Barplots with serial increase in resolution.
Args:
d1
(dict): dictionary with the data.polygon
(bool, optional): show polygon. Defaults to False.polygon_x2i
(float, optional): connect polygon to this subset. Defaults to 0.labelis
(list, optional): label these subsets. Defaults to [].y
(float, optional): y position. Defaults to 0.ylabel
(str, optional): y label. Defaults to None.off_arrowy
(float, optional): offset for the arrow. Defaults to 0.15.kws_rectangle
(type, optional): parameters provided to therectangle
function. Defaults to dict(height=0.5,linewidth=1).ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
function plot_barh_stacked_percentage_intersections
plot_barh_stacked_percentage_intersections(
df0: DataFrame,
colxbool: str,
colybool: str,
colvalue: str,
colid: str,
colalt: str,
colgroupby: str,
coffgroup: float = 0.95,
ax: Axes = None
) → Axes
Plot horizontal stacked bar plot with percentages and intesections.
Args:
df0
(pd.DataFrame): input data.colxbool
(str): x column.colybool
(str): y column.colvalue
(str): column with the values.colid
(str): column with ids.colalt
(str): column with the alternative subset.colgroupby
(str): column with groups.coffgroup
(float, optional): cut-off between the groups. Defaults to 0.95.ax
(plt.Axes, optional):plt.Axes
object. Defaults to None.
Returns:
plt.Axes
:plt.Axes
object.
Examples:
Parameters: colxbool='paralog', colybool='essential', colvalue='value', colid='gene id', colalt='singleton', coffgroup=0.95, colgroupby='tissue',
function to_input_data_sankey
to_input_data_sankey(
df0,
colid,
cols_groupby=None,
colall='all',
remove_all=False
)
function plot_sankey
plot_sankey(
df1,
cols_groupby=None,
hues=None,
node_color=None,
link_color=None,
info=None,
x=None,
y=None,
colors=None,
hovertemplate=None,
text_width=20,
convert=True,
width=400,
height=400,
outp=None,
validate=True,
test=False,
**kws
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file roux-0.1.2.tar.gz
.
File metadata
- Download URL: roux-0.1.2.tar.gz
- Upload date:
- Size: 258.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 399f5b379cc28e0d61eb5d74101da191b9d3b96b33d22f4ca58e0324f81962a0 |
|
MD5 | 6895f90dfd9fc194cd3cde0790247c62 |
|
BLAKE2b-256 | f2cb3f2a9b87d7b8d23acc04a540e363e41f82522a1f61f7d3add6b86662bd92 |
File details
Details for the file roux-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: roux-0.1.2-py3-none-any.whl
- Upload date:
- Size: 237.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 815d9fc72e847b52f3664d9ae673379e664a1ad6dec37c361e9627106c9b6bc8 |
|
MD5 | 36309a9b78a3d12dfcc4c91ecf4565cd |
|
BLAKE2b-256 | 39f334b9c5c0fe06e30242f8e9a00316acf837b4e5d10f4908a7dde7ab23929b |