FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing
Project description
ftarc
FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing
Installation
$ pip install -U ftarc
Dependent commands:
pigz
pbzip2
bgzip
tabix
samtools
(andplot-bamstats
)gnuplot
java
gatk
cutadapt
fastqc
trim_galore
bwa
orbwa-mem2
Docker image
Pull the image from Docker Hub.
$ docker image pull dceoy/ftarc
Usage
Create analysis-ready CRAM files from FASTQ files
input files | output files |
---|---|
read1/read2 FASTQ (Illumina) | analysis-ready CRAM |
-
Download hg38 resource data.
$ ftarc download --dest-dir=/path/to/download/dir
-
Write input file paths and configurations into
ftarc.yml
.$ ftarc init $ vi ftarc.yml # => edit
Example of
ftarc.yml
:--- reference_name: hs38DH adapter_removal: true metrics_collectors: fastqc: true picard: true samtools: true resources: ref_fa: /path/to/GRCh38_full_analysis_set_plus_decoy_hla.fa known_sites_vcf: - /path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz - /path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz - /path/to/Homo_sapiens_assembly38.known_indels.vcf.gz runs: - fq: - /path/to/sample01.WGS.R1.fq.gz - /path/to/sample01.WGS.R2.fq.gz - fq: - /path/to/sample02.WGS.R1.fq.gz - /path/to/sample02.WGS.R2.fq.gz - fq: - /path/to/sample03.WGS.R1.fq.gz - /path/to/sample03.WGS.R2.fq.gz read_group: ID: FLOWCELL-1 PU: UNIT-1 SM: sample03 PL: ILLUMINA LB: LIBRARY-1
-
Create analysis-ready CRAM files from FASTQ files
$ ftarc pipeline --yml=ftarc.yml --workers=2
Standard workflow:
- Trim adapters
trim_galore
- Map reads to a human reference genome
bwa mem
(orbwa-mem2 mem
)
- Mark duplicates
gatk MarkDuplicates
gatk SetNmMdAndUqTags
- Apply BQSR (Base Quality Score Recalibration)
gatk BaseRecalibrator
gatk ApplyBQSR
- Remove duplicates
samtools view
- Validate output CRAM files
gatk ValidateSamFile
- Collect QC metrics
fastqc
samtools
gatk
- Trim adapters
Preprocessing and QC-check
-
Validate BAM or CRAM files using Picard
$ ftarc validate /path/to/genome.fa /path/to/aligned.cram
-
Collect metrics from FASTQ files using FastQC
$ ftarc fastqc read1.fq.gz read2.fq.gz
-
Collect metrics from FASTQ files using FastQC
$ ftarc samqc /path/to/genome.fa /path/to/aligned.cram
-
Apply BQSR to BAM or CRAM files using GATK
$ ftarc bqsr \ --known-sites-vcf=/path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz \ --known-sites-vcf=/path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \ --known-sites-vcf=/path/to/Homo_sapiens_assembly38.known_indels.vcf.gz \ /path/to/genome.fa /path/to/markdup.cram
-
Remove duplicates in marked BAM or CRAM files
$ ftarc dedup /path/to/genome.fa /path/to/markdup.cram
Run ftarc --help
for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ftarc-0.2.1.tar.gz
(23.3 kB
view hashes)
Built Distribution
ftarc-0.2.1-py3-none-any.whl
(30.4 kB
view hashes)