Post-process conifer output for downstream statistical analysis.
Project description
Post-process conifer output for downstream statistical analysis.
conifer-analysis uses dask in order to analyze conifer results in a distributed and out-of-memory fashion. This can be helpful when processing many such results.
Example
Say that you have a bunch of conifer results in a directory. You can generate a histogram of the confidence values per file (sample) and per taxa using the provided pipeline confidence_hist. Even when you work locally, it can be helpful to explicitly create a distributed client controlling the number of workers.
from dask.distributed import Client
from conifer_analysis import confidence_hist
client = Client(n_workers=8)
You can then visit the default dashboard in your browser to observe tasks live. Next, we run the pipeline which returns a pandas.DataFrame.
hist = confidence_hist("data/*.tsv")
hist.info()
As an example of the returned shape:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7700 entries, 0 to 7699
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 path 7700 non-null category
1 name 7700 non-null category
2 taxonomy_id 7700 non-null category
3 bin 7700 non-null interval[float64, right]
4 midpoints 7700 non-null float64
5 read1_hist 7700 non-null int64
6 read2_hist 7700 non-null int64
7 avg_hist 7700 non-null int64
dtypes: category(3), float64(1), int64(3), interval(1)
memory usage: 385.3 KB
Install
It’s as simple as:
pip install conifer-analysis
If you want to observe tasks in the dask dashboard, you will need additional dependencies.
pip install conifer-analysis[dashboard]
Copyright
Copyright © 2022, Moritz E. Beber.
Free software distributed under the Apache Software License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file conifer-analysis-0.1.0.tar.gz
.
File metadata
- Download URL: conifer-analysis-0.1.0.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/3.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1a60d64edc5bd28ad11e65b039e9be47ba40c0a62b5f9e971ab6d1510a85b74 |
|
MD5 | 8e082538c5f334083788ae33f47c1e77 |
|
BLAKE2b-256 | b2c89c2d4b514718093f75901026768a20fb6790a0e282ecde3a0ef050cd6102 |
File details
Details for the file conifer_analysis-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: conifer_analysis-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/3.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f26c7acb3b1d2d58c0b6d1bbbdc56310c417f061e3717ea0316173c0acef955a |
|
MD5 | 023108003bca48d42035e59e9f4d61dd |
|
BLAKE2b-256 | 62c152f8dffd93dc7af9c9740ae5f272746ec58f1a51fe950fabad5ca4412e9d |