Post-process conifer output for downstream statistical analysis.
Project description
Post-process conifer output for downstream statistical analysis.
conifer-analysis uses dask in order to analyze conifer results in a distributed and out-of-memory fashion. This can be helpful when processing many such results.
Example
Say that you have a bunch of conifer results in a directory. You can generate a histogram of the confidence values per file (sample) and per taxa using the provided pipeline confidence_hist. Even when you work locally, it can be helpful to explicitly create a distributed client controlling the number of workers.
from dask.distributed import Client
from conifer_analysis import confidence_hist
client = Client(n_workers=8)
You can then visit the default dashboard in your browser to observe tasks live. Next, we run the pipeline which returns a pandas.DataFrame.
hist = confidence_hist("data/*.tsv")
hist.info()
As an example of the returned shape:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7700 entries, 0 to 7699
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 path 7700 non-null category
1 name 7700 non-null category
2 taxonomy_id 7700 non-null category
3 bin 7700 non-null interval[float64, right]
4 midpoints 7700 non-null float64
5 read1_hist 7700 non-null int64
6 read2_hist 7700 non-null int64
7 avg_hist 7700 non-null int64
dtypes: category(3), float64(1), int64(3), interval(1)
memory usage: 385.3 KB
Install
It’s as simple as:
pip install conifer-analysis
If you want to observe tasks in the dask dashboard, you will need additional dependencies.
pip install conifer-analysis[dashboard]
Copyright
Copyright © 2022, Moritz E. Beber.
Free software distributed under the Apache Software License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for conifer_analysis-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f26c7acb3b1d2d58c0b6d1bbbdc56310c417f061e3717ea0316173c0acef955a |
|
MD5 | 023108003bca48d42035e59e9f4d61dd |
|
BLAKE2b-256 | 62c152f8dffd93dc7af9c9740ae5f272746ec58f1a51fe950fabad5ca4412e9d |