Genomics pipelines
Project description
- Integrated with LabxDB: all required annotations (labels, strand, paired etc) are retrieved from LabxDB. This is optional.
- Based on existing robust technologies. No new language.
- LabxPipe pipelines are defined in JSON text files.
- LabxPipe is written in Python. Using norms, such as input and output filenames, insures compatibility between tasks.
- Simple and complex pipelines.
- By default, pipelines are linear (one step after the other).
- Branching is easily achieved be defining a previous step (using
step_input
parameter) allowing users to create any dependency between tasks.
- Parallelized using robust asynchronous threads from the Python standard library.
Examples
See JSON files in config/pipelines
of this repository.
Pipeline JSON file | |
---|---|
mrna_seq.json |
mRNA-seq |
mrna_seq_no_db.json |
mRNA-seq. No LabxDB |
mrna_seq_with_plotting.json |
mRNA-seq. Plotting non-mapped reads. Demonstrate step_input |
mrna_seq_cufflinks.json |
mRNA-seq. Replaces GeneAbacus by Cufflinks |
chip_seq.json |
ChIP-seq using Bowtie2 and Samtools to uniquify reads. |
Following demonstrates how to apply mrna_seq.json
pipeline. It requires:
- LabxDB
- FASTQ files for sample named
AGR000850
andAGR000912
/plus/data/seq/by_run/AGR000850 ├── 23_009_R1.fastq.zst └── 23_009_R2.fastq.zst /plus/data/seq/by_run/AGR000912 ├── 65_009_R1.fastq.zst └── 65_009_R2.fastq.zst
Note: mrna_seq_no_db.json
demonstrates how to use LabxPipe without LabxDB: it only requires FASTQ files (in path_seq_run
directory, see above).
Requirements:
- LabxDB. Alternatively,
mrna_seq_no_db.json
doesn't require LabxDB. - ReadKnead to trim reads.
- STAR and genome index in directory defined
path_star_index
. - GeneAbacus to count reads and generate genomic profile for tracks.
-
Start pipeline:
lxpipe run --pipeline mrna_seq.json \ --worker 2 \ --processor 16
Output is written in
path_output
directory. -
Create report:
lxpipe report --pipeline mrna_seq.json
Report file
mrna_seq.xlsx
should be created in same directory asmrna_seq.json
. -
Merge gene/mRNA counts generated by GeneAbacus in
counting
directory:lxpipe merge-count --pipeline mrna_seq.json \ --step counting
-
Trackhub. Requirements:
- ChromosomeMappings file (to map chromosome names from Ensembl/NCBI to UCSC)
- Tabulated file (with chromosome name and length)
Execute in a separate directory:
lxpipe trackhub --runs AGR000850,AGR000912 \ --species_ucsc danRer11 \ --path_genome /plus/scratch/sai/annots/danrer_genome_all_ensembl_grcz11_ucsc_chroms_chrom_length.tab \ --path_mapping /plus/scratch/sai/annots/ChromosomeMappings/GRCz11_ensembl2UCSC.txt \ --input_sam \ --bam_names accepted_hits.sam.zst \ --make_config \ --make_trackhub \ --make_bigwig \ --processor 16
Directory is ready to be shared by a web server for display in the UCSC genome browser.
Configuration
Parameters can be defined globally. See in config
directory of this repository for examples.
Writing pipelines
Parameters are defined first globally (see above), then per pipeline, then per replicate/run, and then per step/function. The latest definition takes precedence: path_seq_run
defined in /etc/hts/labxpipe.json
is used by default, but if path_seq_run
is defined in the pipeline file, it will be used instead.
Main parameters
Parameter | Type |
---|---|
name | string |
path_output | string |
path_seq_run | string |
path_annots | string |
path_bowtie2_index | string |
path_star_index | string |
fastq_exts | []strings |
adaptors | {} |
logging_level | string |
run_refs | []strings |
replicate_refs | []strings |
ref_info_source | []strings |
ref_infos | {} |
analysis | [{}, {}, ...] |
Parameters for all functions
Parameter | Type |
---|---|
step_name | string |
step_function | string |
step_desc | string |
force | boolean |
Function-specific parameters
Function | Synonym | Parameter | Type |
---|---|---|---|
readknead | preparing | options | []strings |
ops_r1 | [{}, {}, ...] | ||
ops_r2 | [{}, {}, ...] | ||
plot_fastq_in | boolean | ||
plot_fastq | boolean | ||
fastq_out | boolean | ||
zip_fastq_out | string | ||
bowtie2 | genomic_aligning | options | []strings |
index | string | ||
output | string | ||
output_unfiltered | string | ||
compress_sam | boolean | ||
compress_sam_cmd | string | ||
create_bam | boolean | ||
index_bam | boolean | ||
star | aligning | options | []strings |
index | string | ||
output_type | []strings | ||
compress_sam | boolean | ||
compress_sam_cmd | string | ||
compress_unmapped | boolean | ||
compress_unmapped_cmd | string | ||
cufflinks | options | []strings | |
inputs | [{}, {}, ...] | ||
features | [{}, {}, ...] | ||
geneabacus | counting | options | []strings |
inputs | [{}, {}, ...] | ||
features | [{}, {}, ...] | ||
uniquify | options | []strings | |
sort_by_name_bam | boolean | ||
index_bam | boolean | ||
cleaning | steps | [{}, {}, ...] |
Sample-specific parameters. Automatically populated if using LabxDB or sourced from ref_infos
. These parameters can be changed manually in any function (for example setting paired
to False
will ignore second reads in that step).
Parameter | Type |
---|---|
label_short | string |
paired | boolean |
directional | boolean |
r1_strand | string |
quality_scores | string |
License
LabxPipe is distributed under the Mozilla Public License Version 2.0 (see /LICENSE).
Copyright (C) 2013-2022 Charles E. Vejnar
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.