Skip to main content

Pipeline tool for NGS data

Project description

Pigeon

Introduction

Tool for pipeing inputs and outputs of multiple cli tools. Pigeon takes in only a config file as input. Everything required to run the pipeline are specified in config file. The config file is specified according to python configparser.

Quick Install

Linux&Mac

sudo pip3 install --index-url https://test.pypi.org/simple/ pigeon

Windows

pip install --index-url https://test.pypi.org/simple/ pigeon

Resources for NGS

None of the tools or data files are supplemented by pigeon so they need to be downloaded. For example configuration file, exome sequencing pipeline,

How to use

Create yourself a configuration file

pigeon createconfig

Modify for your analysis. (See below.)

pigeon -c my_config.conf -d

If everything looks alright run for real.

pigeon -c my_config.conf

Config File

Config file consists of three parts.

  • General
  • Pipeline
  • Individual tool blocks

General

Area used to define project name, output directory, input files, and resource files like reference genome or target file. Following variables are necessary for run.

Required:

  • project_name : name of your project

  • output_dir : where to write output files

  • input_files : input files for analysis, space separated, pairs should be next to each other. e.g.

    input_files = A.txt B.txt C.txt

    or for paired

    input_files = A1.txt A2.txt B1.txt B2.txt C1.txt C2.txt

Optional variable can also be decleared here. Based on your or tool requirements. Later these variables can be called in the config file using ${GENERAL:optional_variable}.

Optional(example):

reference_genome = /path/to/my/reference_genome.fa

bed_file = /path/to/my/target.bed

known_snp = /path/to/my/snp.vcf

my_database = /path/to/my/favorite.db

Pipeline

This area should contain paths to tools that is understanble by your shell. As well as the run order of tools. e.g.

pipeline = job1 job2 job3

A = path/to/A

B = path/to/B

C = path/to/C

Tool Blocks

Name of the block should be same as in pipeline. By continuing example above;

[job1]

[job2]

[job3]

Arguments that can be used in these blocks as follows:

Run Args

  • tool: tool variable from pipeline block. e.g.

    tool = ${PIPELINE:A}

  • sub_tool: if tool has a sub tool like 'bwa mem'. e.g.

    sub_tool = mem

  • args: arguments of the tools

  • java: if tool is a jar file add java -jar before it.

  • pass: if True it won't run the block. But the block still be part of the pipeline. This option is helpful for resuming interrupted pipeline.

Input Args

  • input_from: Name of the block that that's output is this jobs input. First jobs input_from should be input_files.

  • input_multi: can be 'paired' or 'all'. Paired option splits input files stream into groups of two. All option uses all of the input files.

  • input_flag_repeat: If tool requires input flag for each input this command will add given flag before each input.

  • secondary_in_placeholder

Output Args

  • suffix: add suffix to output file name

  • ext: file extension of the output

  • dump_dir: creates a directory and outputs there.

  • paired_output: this option will pair the input and the output of the tool.

  • secondary_out_placeholder

  • secondary_suffix

  • secondary_ext

  • secondary_dump_dir

Placeholders

These are joker words that can be used in args.

  • input_placeholder

  • secondary_input_placeholder

  • output_placeholder

  • secondary_output_placeholder

Example Config

[[GENERAL]]
project_name = my_project
output_dir = /path/to/output_directory
input_files = A.txt B.txt C.txt
my_db = /path/to/my.db

[[PIPELINE]]
pipeline = job1 job2 job3
A = /path/to/A
B = /path/to/B
C = /path/to/C

[job1]
tool = A
input_from = input_files
args = -i input_placeholder -o output_placeholder
suffix = job1_A
ext = txt

[job2]
tool = B
input_from = job1
args = -i input_placeholder -o output_placeholder
suffix = job2_B
ext = txt

[job3]
tool = C
input_from = job2
args = -i input_placeholder -o output_placeholder
suffix = job3_C
ext = txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

seq_pigeon-0.3.4-py3-none-any.whl (13.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page