Skip to main content

A Python PiPeLine framework

Project description

PyPPL - A Python PiPeLine framework

Pypi Github PythonVers docs Travis building Codacy Codacy coverage

Documentation | API | Change log

Features

  • Process caching.
  • Process reusability.
  • Process error handling.
  • Runner customization.
  • Easy running profile switching.
  • Plugin system.

Plugin gallery

Installation

pip install PyPPL

Writing pipelines with predefined processes

Let's say we are implementing the TCGA DNA-Seq Re-alignment Workflow (The very left part of following figure). For demonstration, we will skip the QC and the co-clean parts here.

DNA_Seq_Variant_Calling_Pipeline

demo.py:

from pyppl import PyPPL, Channel
# import predefined processes
from TCGAprocs import pBamToFastq, pAlignment, pBamSort, pBamMerge, pMarkDups

# Load the bam files
pBamToFastq.input = Channel.fromPattern('/path/to/*.bam')
# Align the reads to reference genome
pAlignment.depends = pBamToFastq
# Sort bam files
pBamSort.depends = pAlignment
# Merge bam files
pBamMerge.depends = pBamSort
# Mark duplicates
pMarkDups.depends = pBamMerge
# Export the results
pMarkDups.exdir = '/path/to/realigned_Bams'
# Specify the start process and run the pipeline
PyPPL().start(pBamToFastq).run()

asciicast

Implementing individual processes

TCGAprocs.py:

from pyppl import Proc
pBamToFastq = Proc(desc = 'Convert bam files to fastq files.')
pBamToFastq.input = 'infile:file'
pBamToFastq.output = [
    'fq1:file:{{i.infile | stem}}_1.fq.gz',
    'fq2:file:{{i.infile | stem}}_2.fq.gz']
pBamToFastq.script = '''
bamtofastq collate=1 exclude=QCFAIL,SECONDARY,SUPPLEMENTARY \
    filename= {{i.infile}} gz=1 inputformat=bam level=5 \
    outputdir= {{job.outdir}} outputperreadgroup=1 tryoq=1 \
    outputperreadgroupsuffixF=_1.fq.gz \
    outputperreadgroupsuffixF2=_2.fq.gz \
    outputperreadgroupsuffixO=_o1.fq.gz \
    outputperreadgroupsuffixO2=_o2.fq.gz \
    outputperreadgroupsuffixS=_s.fq.gz
'''

pAlignment = Proc(desc = 'Align reads to reference genome.')
pAlignment.input = 'fq1:file, fq2:file'
#                             name_1.fq.gz => name.bam
pAlignment.output = 'bam:file:{{i.fq1 | stem | stem | [:-2]}}.bam'
pAlignment.script = '''
bwa mem -t 8 -T 0 -R <read_group> <reference> {{i.fq1}} {{i.fq2}} | \
    samtools view -Shb -o {{o.bam}} -
'''

pBamSort = Proc(desc = 'Sort bam files.')
pBamSort.input = 'inbam:file'
pBamSort.output = 'outbam:file:{{i.inbam | basename}}'
pBamSort.script = '''
java -jar picard.jar SortSam CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} SORT_ORDER=coordinate VALIDATION_STRINGENCY=STRICT
'''

pBamMerge = Proc(desc = 'Merge bam files.')
pBamMerge.input = 'inbam:file'
pBamMerge.output = 'outbam:file:{{i.inbam | basename}}'
pBamMerge.script = '''
java -jar picard.jar MergeSamFiles ASSUME_SORTED=false CREATE_INDEX=true \
    INPUT={{i.inbam}} MERGE_SEQUENCE_DICTIONARIES=false OUTPUT={{o.outbam}} \
    SORT_ORDER=coordinate USE_THREADING=true VALIDATION_STRINGENCY=STRICT
'''

pMarkDups = Proc(desc = 'Mark duplicates.')
pMarkDups.input = 'inbam:file'
pMarkDups.output = 'outbam:file:{{i.inbam | basename}}'
pMarkDups.script = '''
java -jar picard.jar MarkDuplicates CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} VALIDATION_STRINGENCY=STRICT
'''

Each process is indenpendent so that you may also reuse the processes in other pipelines.

Pipeline flowchart

# When try to run your pipline, instead of:
#   PyPPL().start(pBamToFastq).run()
# do:
PyPPL().start(pBamToFastq).flowchart().run()

Then an SVG file endswith .pyppl.svg will be generated under current directory. Note that this function requires Graphviz and graphviz for python.

See plugin details.

flowchart

Pipeline report

See plugin details

pPyClone.report = """
## {{title}}

PyClone[1] is a tool using Probabilistic model for inferring clonal population structure from deep NGS sequencing.

![Similarity matrix]({{path.join(job.o.outdir, "plots/loci/similarity_matrix.svg")}})

```table
caption: Clusters
file: "{{path.join(job.o.outdir, "tables/cluster.tsv")}}"
rows: 10
```

[1]: Roth, Andrew, et al. "PyClone: statistical inference of clonal population structure in cancer." Nature methods 11.4 (2014): 396.
"""

# or use a template file

pPyClone.report = "file:/path/to/template.md"
PyPPL().start(pPyClone).run().report('/path/to/report', title = 'Clonality analysis using PyClone')

report

Full documentation

ReadTheDocs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyPPL-3.0.0rc3.tar.gz (56.3 kB view details)

Uploaded Source

Built Distribution

PyPPL-3.0.0rc3-py3-none-any.whl (61.8 kB view details)

Uploaded Python 3

File details

Details for the file PyPPL-3.0.0rc3.tar.gz.

File metadata

  • Download URL: PyPPL-3.0.0rc3.tar.gz
  • Upload date:
  • Size: 56.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0 CPython/3.7.4 Linux/4.4.0-19041-Microsoft

File hashes

Hashes for PyPPL-3.0.0rc3.tar.gz
Algorithm Hash digest
SHA256 b0241f378e5876c283ebc233a90c4ddd8002078988d373d6060dd410da423fdb
MD5 12276635d9a7180a1a0cf97d561a8ba5
BLAKE2b-256 29abdac9b4f714ffdf9147b165d581852638f23d7af004fd0eedcb5a2028923d

See more details on using hashes here.

File details

Details for the file PyPPL-3.0.0rc3-py3-none-any.whl.

File metadata

  • Download URL: PyPPL-3.0.0rc3-py3-none-any.whl
  • Upload date:
  • Size: 61.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0 CPython/3.7.4 Linux/4.4.0-19041-Microsoft

File hashes

Hashes for PyPPL-3.0.0rc3-py3-none-any.whl
Algorithm Hash digest
SHA256 c8535093c495304dccc3ac419309a084522249baf28a0dd437a9cbbf42165efb
MD5 6b1bec0f3e6ec9699597a9a49d0b801a
BLAKE2b-256 925e9dc702682faa9855be781ed7054759337674fc3cec44ee809fa31c773aab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page