Skip to main content

Post-process conifer output for downstream statistical analysis.

Project description

Current PyPI Version Supported Python Versions Apache Software License Version 2.0 Code of Conduct GitHub Actions Codecov Code Style Black Documentation Status

Post-process conifer output for downstream statistical analysis.

conifer-analysis uses dask in order to analyze conifer results in a distributed and out-of-memory fashion. This can be helpful when processing many such results.

Example

Say that you have a bunch of conifer results in a directory. You can generate a histogram of the confidence values per file (sample) and per taxa using the provided pipeline confidence_hist. Even when you work locally, it can be helpful to explicitly create a distributed client controlling the number of workers.

from dask.distributed import Client
from conifer_analysis import confidence_hist

client = Client(n_workers=8)

You can then visit the default dashboard in your browser to observe tasks live. Next, we run the pipeline which returns a pandas.DataFrame.

hist = confidence_hist("data/*.tsv")
hist.info()

As an example of the returned shape:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7700 entries, 0 to 7699
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   path         7700 non-null   category
 1   name         7700 non-null   category
 2   taxonomy_id  7700 non-null   category
 3   bin          7700 non-null   interval[float64, right]
 4   midpoints    7700 non-null   float64
 5   read1_hist   7700 non-null   int64
 6   read2_hist   7700 non-null   int64
 7   avg_hist     7700 non-null   int64
dtypes: category(3), float64(1), int64(3), interval(1)
memory usage: 385.3 KB

Install

It’s as simple as:

pip install conifer-analysis

If you want to observe tasks in the dask dashboard, you will need additional dependencies.

pip install conifer-analysis[dashboard]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conifer-analysis-0.1.0.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

conifer_analysis-0.1.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file conifer-analysis-0.1.0.tar.gz.

File metadata

  • Download URL: conifer-analysis-0.1.0.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/3.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for conifer-analysis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b1a60d64edc5bd28ad11e65b039e9be47ba40c0a62b5f9e971ab6d1510a85b74
MD5 8e082538c5f334083788ae33f47c1e77
BLAKE2b-256 b2c89c2d4b514718093f75901026768a20fb6790a0e282ecde3a0ef050cd6102

See more details on using hashes here.

File details

Details for the file conifer_analysis-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: conifer_analysis-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/3.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for conifer_analysis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f26c7acb3b1d2d58c0b6d1bbbdc56310c417f061e3717ea0316173c0acef955a
MD5 023108003bca48d42035e59e9f4d61dd
BLAKE2b-256 62c152f8dffd93dc7af9c9740ae5f272746ec58f1a51fe950fabad5ca4412e9d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page