Pipeline tool for NGS data
Project description
Pigeon
Introduction
Tool for pipeing inputs and outputs of multiple cli tools. Pigeon takes in only a config file as input. Everything required to run the pipeline are specified in config file. The config file is specified according to python configparser.
Quick Install
Linux&Mac
sudo pip3 install --index-url https://test.pypi.org/simple/ pigeon
Windows
pip install --index-url https://test.pypi.org/simple/ pigeon
Resources for NGS
None of the tools or data files are supplemented by pigeon so they need to be downloaded. For example configuration file, exome sequencing pipeline,
- Tools
- Reference Files
- Reference genome and known SNP&INDELS
- hg19 or
- hg38
- Bed file
- See website of capture kit used in sequencing
- Reference genome and known SNP&INDELS
How to use
Create yourself a configuration file
pigeon createconfig
Modify for your analysis. (See below.)
pigeon -c my_config.conf -d
If everything looks alright run for real.
pigeon -c my_config.conf
Config File
Config file consists of three parts.
- General
- Pipeline
- Individual tool blocks
General
Area used to define project name, output directory, input files, and resource files like reference genome or target file. Following variables are necessary for run.
Required:
-
project_name : name of your project
-
output_dir : where to write output files
-
input_files : input files for analysis, space separated, pairs should be next to each other. e.g.
input_files = A.txt B.txt C.txt
or for paired
input_files = A1.txt A2.txt B1.txt B2.txt C1.txt C2.txt
Optional variable can also be decleared here. Based on your or tool requirements. Later these variables can be called in the config file using ${GENERAL:optional_variable}.
Optional(example):
reference_genome = /path/to/my/reference_genome.fa
bed_file = /path/to/my/target.bed
known_snp = /path/to/my/snp.vcf
my_database = /path/to/my/favorite.db
Pipeline
This area should contain paths to tools that is understanble by your shell. As well as the run order of tools. e.g.
pipeline = job1 job2 job3
A = path/to/A
B = path/to/B
C = path/to/C
Tool Blocks
Name of the block should be same as in pipeline. By continuing example above;
[job1]
[job2]
[job3]
Arguments that can be used in these blocks as follows:
Run Args
-
tool: tool variable from pipeline block. e.g.
tool = ${PIPELINE:A}
-
sub_tool: if tool has a sub tool like 'bwa mem'. e.g.
sub_tool = mem
-
args: arguments of the tools
-
java: if tool is a jar file add java -jar before it.
-
pass: if True it won't run the block. But the block still be part of the pipeline. This option is helpful for resuming interrupted pipeline.
Input Args
-
input_from: Name of the block that that's output is this jobs input. First jobs input_from should be input_files.
-
input_multi: can be 'paired' or 'all'. Paired option splits input files stream into groups of two. All option uses all of the input files.
-
input_flag_repeat: If tool requires input flag for each input this command will add given flag before each input.
-
secondary_in_placeholder
Output Args
-
suffix: add suffix to output file name
-
ext: file extension of the output
-
dump_dir: creates a directory and outputs there.
-
paired_output: this option will pair the input and the output of the tool.
-
secondary_out_placeholder
-
secondary_suffix
-
secondary_ext
-
secondary_dump_dir
Placeholders
These are joker words that can be used in args.
-
input_placeholder
-
secondary_input_placeholder
-
output_placeholder
-
secondary_output_placeholder
Example Config
[[GENERAL]]
project_name = my_project
output_dir = /path/to/output_directory
input_files = A.txt B.txt C.txt
my_db = /path/to/my.db
[[PIPELINE]]
pipeline = job1 job2 job3
A = /path/to/A
B = /path/to/B
C = /path/to/C
[job1]
tool = A
input_from = input_files
args = -i input_placeholder -o output_placeholder
suffix = job1_A
ext = txt
[job2]
tool = B
input_from = job1
args = -i input_placeholder -o output_placeholder
suffix = job2_B
ext = txt
[job3]
tool = C
input_from = job2
args = -i input_placeholder -o output_placeholder
suffix = job3_C
ext = txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file seq_pigeon-0.4.3-py3-none-any.whl
.
File metadata
- Download URL: seq_pigeon-0.4.3-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.8.0 tqdm/4.48.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d5ed483e64662474e784b7f54d536d0e173d7d273ca81a6cafc55d033a49e65 |
|
MD5 | c27239f980cb8c5aca002b8664da25b2 |
|
BLAKE2b-256 | a4454eaf6f4a5630c61feceed58180de29d4636c1e973e40a5634064c773f71a |