Skip to main content

Post-process conifer output for downstream statistical analysis.

Project description

Current PyPI Version Supported Python Versions Apache Software License Version 2.0 Code of Conduct GitHub Actions Codecov Code Style Black Documentation Status

Post-process conifer output for downstream statistical analysis.

conifer-analysis uses dask in order to analyze conifer results in a distributed and out-of-memory fashion. This can be helpful when processing many such results.

Example

Say that you have a bunch of conifer results in a directory. You can generate a histogram of the confidence values per file (sample) and per taxa using the provided pipeline confidence_hist. Even when you work locally, it can be helpful to explicitly create a distributed client controlling the number of workers.

from dask.distributed import Client
from conifer_analysis import confidence_hist

client = Client(n_workers=8)

You can then visit the default dashboard in your browser to observe tasks live. Next, we run the pipeline which returns a pandas.DataFrame.

hist = confidence_hist("data/*.tsv")
hist.info()

As an example of the returned shape:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7700 entries, 0 to 7699
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   path         7700 non-null   category
 1   name         7700 non-null   category
 2   taxonomy_id  7700 non-null   category
 3   bin          7700 non-null   interval[float64, right]
 4   midpoints    7700 non-null   float64
 5   read1_hist   7700 non-null   int64
 6   read2_hist   7700 non-null   int64
 7   avg_hist     7700 non-null   int64
dtypes: category(3), float64(1), int64(3), interval(1)
memory usage: 385.3 KB

Install

It’s as simple as:

pip install conifer-analysis

If you want to observe tasks in the dask dashboard, you will need additional dependencies.

pip install conifer-analysis[dashboard]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conifer-analysis-0.1.0.tar.gz (10.0 kB view hashes)

Uploaded Source

Built Distribution

conifer_analysis-0.1.0-py3-none-any.whl (11.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page