A Python PiPeLine framework

These details have not been verified by PyPI

Project links

Project description

PyPPL - A Python PiPeLine framework

Features

Process caching.
Process reusability.
Process error handling.
Runner customization.
Running profile switching.
Plugin system.
Pipeline flowchart (using plugin pyppl_flowchart).
Pipeline report (using plugin pyppl_report).

Installation

pip install PyPPL

Writing pipelines with predefined processes

Let's say we are implementing the TCGA DNA-Seq Re-alignment Workflow (The very left part of following figure). For demonstration, we will skip the QC and the co-clean parts here.

demo.py:

from pyppl import PyPPL, Channel
# import predefined processes
from TCGAprocs import pBamToFastq, pAlignment, pBamSort, pBamMerge, pMarkDups

# Load the bam files
pBamToFastq.input = Channel.fromPattern('/path/to/*.bam')
# Align the reads to reference genome
pAlignment.depends = pBamToFastq
# Sort bam files
pBamSort.depends = pAlignment
# Merge bam files
pBamMerge.depends = pBamSort
# Mark duplicates
pMarkDups.depends = pBamMerge
# Export the results
pMarkDups.exdir = '/path/to/realigned_Bams'
# Specify the start process and run the pipeline
PyPPL().start(pBamToFastq).run()

Implementing individual processes

TCGAprocs.py:

from pyppl import Proc
pBamToFastq = Proc(desc = 'Convert bam files to fastq files.')
pBamToFastq.input = 'infile:file'
pBamToFastq.output = [
    'fq1:file:{{i.infile | stem}}_1.fq.gz',
    'fq2:file:{{i.infile | stem}}_2.fq.gz']
pBamToFastq.script = '''
bamtofastq collate=1 exclude=QCFAIL,SECONDARY,SUPPLEMENTARY \
    filename= {{i.infile}} gz=1 inputformat=bam level=5 \
    outputdir= {{job.outdir}} outputperreadgroup=1 tryoq=1 \
    outputperreadgroupsuffixF=_1.fq.gz \
    outputperreadgroupsuffixF2=_2.fq.gz \
    outputperreadgroupsuffixO=_o1.fq.gz \
    outputperreadgroupsuffixO2=_o2.fq.gz \
    outputperreadgroupsuffixS=_s.fq.gz
'''

pAlignment = Proc(desc = 'Align reads to reference genome.')
pAlignment.input = 'fq1:file, fq2:file'
#                             name_1.fq.gz => name.bam
pAlignment.output = 'bam:file:{{i.fq1 | stem | stem | [:-2]}}.bam'
pAlignment.script = '''
bwa mem -t 8 -T 0 -R <read_group> <reference> {{i.fq1}} {{i.fq2}} | \
    samtools view -Shb -o {{o.bam}} -
'''

pBamSort = Proc(desc = 'Sort bam files.')
pBamSort.input = 'inbam:file'
pBamSort.output = 'outbam:file:{{i.inbam | basename}}'
pBamSort.script = '''
java -jar picard.jar SortSam CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} SORT_ORDER=coordinate VALIDATION_STRINGENCY=STRICT
'''

pBamMerge = Proc(desc = 'Merge bam files.')
pBamMerge.input = 'inbam:file'
pBamMerge.output = 'outbam:file:{{i.inbam | basename}}'
pBamMerge.script = '''
java -jar picard.jar MergeSamFiles ASSUME_SORTED=false CREATE_INDEX=true \
    INPUT={{i.inbam}} MERGE_SEQUENCE_DICTIONARIES=false OUTPUT={{o.outbam}} \
    SORT_ORDER=coordinate USE_THREADING=true VALIDATION_STRINGENCY=STRICT
'''

pMarkDups = Proc(desc = 'Mark duplicates.')
pMarkDups.input = 'inbam:file'
pMarkDups.output = 'outbam:file:{{i.inbam | basename}}'
pMarkDups.script = '''
java -jar picard.jar MarkDuplicates CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} VALIDATION_STRINGENCY=STRICT
'''

Each process is indenpendent so that you may also reuse the processes in other pipelines.

Pipeline flowchart

# When try to run your pipline, instead of:
#   PyPPL().start(pBamToFastq).run()
# do:
PyPPL().start(pBamToFastq).flowchart().run()

Then an SVG file endswith .pyppl.svg will be generated under current directory. Note that this function requires Graphviz and graphviz for python.

See plugin details.

flowchart

Pipeline report

See plugin details

pPyClone.report = """
## {{title}}

PyClone[1] is a tool using Probabilistic model for inferring clonal population structure from deep NGS sequencing.

![Similarity matrix]({{path.join(job.o.outdir, "plots/loci/similarity_matrix.svg")}})

```table
caption: Clusters
file: "{{path.join(job.o.outdir, "tables/cluster.tsv")}}"
rows: 10
```

[1]: Roth, Andrew, et al. "PyClone: statistical inference of clonal population structure in cancer." Nature methods 11.4 (2014): 396.
"""

# or use a template file

pPyClone.report = "file:/path/to/template.md"

PyPPL().start(pPyClone).run().report('/path/to/report', title = 'Clonality analysis using PyClone')

report

Full documentation

ReadTheDocs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.2.2

Jun 6, 2020

3.2.1

May 28, 2020

3.2.0

May 11, 2020

3.1.1

May 11, 2020

3.1.0

Apr 22, 2020

3.0.3

Jan 22, 2020

3.0.2

Jan 15, 2020

3.0.1

Dec 31, 2019

3.0.0

Dec 27, 2019

3.0.0rc3 pre-release

Dec 25, 2019

3.0.0rc2 pre-release

Dec 25, 2019

3.0.0rc1 pre-release

Dec 25, 2019

3.0.0rc0 pre-release

Dec 24, 2019

2.3.2.post0

Dec 6, 2019

This version

2.3.2

Dec 5, 2019

2.3.1

Nov 29, 2019

2.3.0

Nov 19, 2019

2.2.0

Oct 23, 2019

2.1.4

Oct 11, 2019

2.1.3

Aug 21, 2019

2.1.2.post0

Aug 21, 2019

2.1.2

Aug 19, 2019

2.1.1.post0

Aug 8, 2019

2.1.0

Aug 6, 2019

2.0.0

Jun 29, 2019

1.4.3

Jan 28, 2019

1.4.2

Jan 23, 2019

1.4.1

Jan 9, 2019

1.4.0

Dec 20, 2018

1.3.1

Nov 8, 2018

1.3.0

Nov 2, 2018

1.2.0

Sep 28, 2018

1.1.2

Sep 13, 2018

1.1.1

Aug 30, 2018

1.1.0

Aug 20, 2018

1.0.1

Jul 31, 2018

1.0.0

Jul 10, 2018

0.9.6

Jun 8, 2018

0.9.5

Mar 6, 2018

0.9.4

Dec 27, 2017

0.9.3

Nov 20, 2017

0.9.2

Oct 23, 2017

0.9.1

Oct 6, 2017

0.9.0

Sep 22, 2017

0.9.0b3 pre-release

Sep 22, 2017

0.9.0b2 pre-release

Sep 22, 2017

0.9.0b1 pre-release

Sep 22, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyPPL-2.3.2.tar.gz (67.9 kB view details)

Uploaded Dec 5, 2019 Source

Built Distribution

PyPPL-2.3.2-py3-none-any.whl (70.6 kB view details)

Uploaded Dec 5, 2019 Python 3

File details

Details for the file PyPPL-2.3.2.tar.gz.

File metadata

Download URL: PyPPL-2.3.2.tar.gz
Upload date: Dec 5, 2019
Size: 67.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/0.12.17 CPython/3.7.3 Linux/2.6.32-754.15.3.el6.x86_64

File hashes

Hashes for PyPPL-2.3.2.tar.gz
Algorithm	Hash digest
SHA256	`3b990311d2de5143a22f2eb1daad41e139d0acbb83aa99ce52f1b017ab66a806`
MD5	`19ead1c192b85ebbb731c12fb8c6dc66`
BLAKE2b-256	`bad5eb9431813b59f8bf4e4ca1a99030aa44c65b9c95ef493888fdb1837d3072`

See more details on using hashes here.

File details

Details for the file PyPPL-2.3.2-py3-none-any.whl.

File metadata

Download URL: PyPPL-2.3.2-py3-none-any.whl
Upload date: Dec 5, 2019
Size: 70.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/0.12.17 CPython/3.7.3 Linux/2.6.32-754.15.3.el6.x86_64

File hashes

Hashes for PyPPL-2.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba8412c5b291acd8da9cd0d9c949e06c91280aa81ca45185c798feb6b3170c48`
MD5	`d49d375ed44fe00e562ecea84b9570db`
BLAKE2b-256	`8fc6934ec33440bf01cfa6fc2e8ff5cafee3ca73d3f1705a21f9b8578ab1fc71`