Skip to main content

A Python PiPeLine framework

Project description

PyPPL - A Python PiPeLine framework

Pypi Github PythonVers docs Travis building Codacy Codacy coverage

Documentation | API | Change log


  • Process caching.
  • Process reusability.
  • Process error handling.
  • Runner customization.
  • Easy running profile switching.
  • Plugin system.


pip install PyPPL

Plugin gallery

(*) shipped with PyPPL

Writing pipelines with predefined processes

Let's say we are implementing the TCGA DNA-Seq Re-alignment Workflow (The very left part of following figure). For demonstration, we will skip the QC and the co-clean parts here.


from pyppl import PyPPL, Channel
# import predefined processes
from TCGAprocs import pBamToFastq, pAlignment, pBamSort, pBamMerge, pMarkDups

# Load the bam files
pBamToFastq.input = Channel.fromPattern('/path/to/*.bam')
# Align the reads to reference genome
pAlignment.depends = pBamToFastq
# Sort bam files
pBamSort.depends = pAlignment
# Merge bam files
pBamMerge.depends = pBamSort
# Mark duplicates
pMarkDups.depends = pBamMerge
# Export the results
pMarkDups.config.export_dir = '/path/to/realigned_Bams'
# Specify the start process and run the pipeline


Implementing individual processes

from pyppl import Proc
pBamToFastq = Proc(desc = 'Convert bam files to fastq files.')
pBamToFastq.input = 'infile:file'
pBamToFastq.output = [
    'fq1:file:{{i.infile | stem}}_1.fq.gz',
    'fq2:file:{{i.infile | stem}}_2.fq.gz']
pBamToFastq.script = '''
bamtofastq collate=1 exclude=QCFAIL,SECONDARY,SUPPLEMENTARY \
    filename= {{i.infile}} gz=1 inputformat=bam level=5 \
    outputdir= {{job.outdir}} outputperreadgroup=1 tryoq=1 \
    outputperreadgroupsuffixF=_1.fq.gz \
    outputperreadgroupsuffixF2=_2.fq.gz \
    outputperreadgroupsuffixO=_o1.fq.gz \
    outputperreadgroupsuffixO2=_o2.fq.gz \

pAlignment = Proc(desc = 'Align reads to reference genome.')
pAlignment.input = 'fq1:file, fq2:file'
#                             name_1.fq.gz => name.bam
pAlignment.output = 'bam:file:{{i.fq1 | stem | stem | [:-2]}}.bam'
pAlignment.script = '''
bwa mem -t 8 -T 0 -R <read_group> <reference> {{i.fq1}} {{i.fq2}} | \
    samtools view -Shb -o {{o.bam}} -

pBamSort = Proc(desc = 'Sort bam files.')
pBamSort.input = 'inbam:file'
pBamSort.output = 'outbam:file:{{i.inbam | basename}}'
pBamSort.script = '''
java -jar picard.jar SortSam CREATE_INDEX=true INPUT={{i.inbam}} \

pBamMerge = Proc(desc = 'Merge bam files.')
pBamMerge.input = 'inbam:file'
pBamMerge.output = 'outbam:file:{{i.inbam | basename}}'
pBamMerge.script = '''
java -jar picard.jar MergeSamFiles ASSUME_SORTED=false CREATE_INDEX=true \
    INPUT={{i.inbam}} MERGE_SEQUENCE_DICTIONARIES=false OUTPUT={{o.outbam}} \

pMarkDups = Proc(desc = 'Mark duplicates.')
pMarkDups.input = 'inbam:file'
pMarkDups.output = 'outbam:file:{{i.inbam | basename}}'
pMarkDups.script = '''
java -jar picard.jar MarkDuplicates CREATE_INDEX=true INPUT={{i.inbam}} \

Each process is indenpendent so that you may also reuse the processes in other pipelines.

Pipeline flowchart

# When try to run your pipline, instead of:
#   PyPPL().start(pBamToFastq).run()
# do:

Then an SVG file endswith .pyppl.svg will be generated under current directory. Note that this function requires Graphviz and graphviz for python.

See plugin details.


Pipeline report

See plugin details = """
## {{title}}

PyClone[1] is a tool using Probabilistic model for inferring clonal population structure from deep NGS sequencing.

![Similarity matrix]({{path.join(job.o.outdir, "plots/loci/similarity_matrix.svg")}})

caption: Clusters
file: "{{path.join(job.o.outdir, "tables/cluster.tsv")}}"
rows: 10

[1]: Roth, Andrew, et al. "PyClone: statistical inference of clonal population structure in cancer." Nature methods 11.4 (2014): 396.

# or use a template file = "file:/path/to/"
PyPPL().start(pPyClone).run().report('/path/to/report', title = 'Clonality analysis using PyClone')


Full documentation


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pyppl, version 3.2.1
Filename, size File type Python version Upload date Hashes
Filename, size PyPPL-3.2.1-py3-none-any.whl (67.8 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size PyPPL-3.2.1.tar.gz (62.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page