Skip to main content

A Python PiPeLine framework

Project description

PyPPL - A Python PiPeLine framework

Pypi Github PythonVers docs Travis building Codacy Codacy coverage

Documentation | API | Change log

Features

  • Process caching.
  • Process reusability.
  • Process error handling.
  • Runner customization.
  • Easy running profile switching.
  • Plugin system.

Installation

pip install PyPPL

Plugin gallery

(*) shipped with PyPPL

Writing pipelines with predefined processes

Let's say we are implementing the TCGA DNA-Seq Re-alignment Workflow (The very left part of following figure). For demonstration, we will skip the QC and the co-clean parts here.

DNA_Seq_Variant_Calling_Pipeline

demo.py:

from pyppl import PyPPL, Channel
# import predefined processes
from TCGAprocs import pBamToFastq, pAlignment, pBamSort, pBamMerge, pMarkDups

# Load the bam files
pBamToFastq.input = Channel.fromPattern('/path/to/*.bam')
# Align the reads to reference genome
pAlignment.depends = pBamToFastq
# Sort bam files
pBamSort.depends = pAlignment
# Merge bam files
pBamMerge.depends = pBamSort
# Mark duplicates
pMarkDups.depends = pBamMerge
# Export the results
pMarkDups.config.export_dir = '/path/to/realigned_Bams'
# Specify the start process and run the pipeline
PyPPL().start(pBamToFastq).run()

asciicast

Implementing individual processes

TCGAprocs.py:

from pyppl import Proc
pBamToFastq = Proc(desc = 'Convert bam files to fastq files.')
pBamToFastq.input = 'infile:file'
pBamToFastq.output = [
    'fq1:file:{{i.infile | stem}}_1.fq.gz',
    'fq2:file:{{i.infile | stem}}_2.fq.gz']
pBamToFastq.script = '''
bamtofastq collate=1 exclude=QCFAIL,SECONDARY,SUPPLEMENTARY \
    filename= {{i.infile}} gz=1 inputformat=bam level=5 \
    outputdir= {{job.outdir}} outputperreadgroup=1 tryoq=1 \
    outputperreadgroupsuffixF=_1.fq.gz \
    outputperreadgroupsuffixF2=_2.fq.gz \
    outputperreadgroupsuffixO=_o1.fq.gz \
    outputperreadgroupsuffixO2=_o2.fq.gz \
    outputperreadgroupsuffixS=_s.fq.gz
'''

pAlignment = Proc(desc = 'Align reads to reference genome.')
pAlignment.input = 'fq1:file, fq2:file'
#                             name_1.fq.gz => name.bam
pAlignment.output = 'bam:file:{{i.fq1 | stem | stem | [:-2]}}.bam'
pAlignment.script = '''
bwa mem -t 8 -T 0 -R <read_group> <reference> {{i.fq1}} {{i.fq2}} | \
    samtools view -Shb -o {{o.bam}} -
'''

pBamSort = Proc(desc = 'Sort bam files.')
pBamSort.input = 'inbam:file'
pBamSort.output = 'outbam:file:{{i.inbam | basename}}'
pBamSort.script = '''
java -jar picard.jar SortSam CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} SORT_ORDER=coordinate VALIDATION_STRINGENCY=STRICT
'''

pBamMerge = Proc(desc = 'Merge bam files.')
pBamMerge.input = 'inbam:file'
pBamMerge.output = 'outbam:file:{{i.inbam | basename}}'
pBamMerge.script = '''
java -jar picard.jar MergeSamFiles ASSUME_SORTED=false CREATE_INDEX=true \
    INPUT={{i.inbam}} MERGE_SEQUENCE_DICTIONARIES=false OUTPUT={{o.outbam}} \
    SORT_ORDER=coordinate USE_THREADING=true VALIDATION_STRINGENCY=STRICT
'''

pMarkDups = Proc(desc = 'Mark duplicates.')
pMarkDups.input = 'inbam:file'
pMarkDups.output = 'outbam:file:{{i.inbam | basename}}'
pMarkDups.script = '''
java -jar picard.jar MarkDuplicates CREATE_INDEX=true INPUT={{i.inbam}} \
    OUTPUT={{o.outbam}} VALIDATION_STRINGENCY=STRICT
'''

Each process is indenpendent so that you may also reuse the processes in other pipelines.

Pipeline flowchart

# When try to run your pipline, instead of:
#   PyPPL().start(pBamToFastq).run()
# do:
PyPPL().start(pBamToFastq).flowchart().run()

Then an SVG file endswith .pyppl.svg will be generated under current directory. Note that this function requires Graphviz and graphviz for python.

See plugin details.

flowchart

Pipeline report

See plugin details

pPyClone.report = """
## {{title}}

PyClone[1] is a tool using Probabilistic model for inferring clonal population structure from deep NGS sequencing.

![Similarity matrix]({{path.join(job.o.outdir, "plots/loci/similarity_matrix.svg")}})

```table
caption: Clusters
file: "{{path.join(job.o.outdir, "tables/cluster.tsv")}}"
rows: 10
```

[1]: Roth, Andrew, et al. "PyClone: statistical inference of clonal population structure in cancer." Nature methods 11.4 (2014): 396.
"""

# or use a template file

pPyClone.report = "file:/path/to/template.md"
PyPPL().start(pPyClone).run().report('/path/to/report', title = 'Clonality analysis using PyClone')

report

Full documentation

ReadTheDocs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyPPL-3.0.2.tar.gz (59.6 kB view details)

Uploaded Source

Built Distribution

PyPPL-3.0.2-py3-none-any.whl (65.3 kB view details)

Uploaded Python 3

File details

Details for the file PyPPL-3.0.2.tar.gz.

File metadata

  • Download URL: PyPPL-3.0.2.tar.gz
  • Upload date:
  • Size: 59.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.2 CPython/3.7.1 Linux/4.15.0-1028-gcp

File hashes

Hashes for PyPPL-3.0.2.tar.gz
Algorithm Hash digest
SHA256 c2a81d5b522e505d4efcbcd50511d97b655c743aa89a6a67db7660bc31d02a7e
MD5 3b0bc33304e78e10588469490f8a4dea
BLAKE2b-256 d2fbfbb6bbf5154ef1bd5a6b188f217ea25ec8181f64a0dded73a637b364f31c

See more details on using hashes here.

File details

Details for the file PyPPL-3.0.2-py3-none-any.whl.

File metadata

  • Download URL: PyPPL-3.0.2-py3-none-any.whl
  • Upload date:
  • Size: 65.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.2 CPython/3.7.1 Linux/4.15.0-1028-gcp

File hashes

Hashes for PyPPL-3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bb420ca4b53180da510bdfe414a064e1d8458fa0d83f0b17283dbf50d2b4b1a1
MD5 65bb703791be596abaf01353b6a21a11
BLAKE2b-256 29e29b8fa91f171702b8909c375fe784407787c7dbdfd5cc6b9a05ff7f0434c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page