FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing
Project description
ftarc
FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing
Installation
$ pip install -U ftarc
Dependent commands:
pigz
pbzip2
bgzip
tabix
samtools
(andplot-bamstats
)gnuplot
java
gatk
cutadapt
fastqc
trim_galore
bwa
orbwa-mem2
Docker image
Pull the image from Docker Hub.
$ docker image pull dceoy/ftarc
Usage
Create analysis-ready CRAM files from FASTQ files
input files | output files |
---|---|
read1/read2 FASTQ (Illumina) | analysis-ready CRAM |
-
Download hg38 resource data.
$ ftarc download --dest-dir=/path/to/download/dir
-
Write input file paths and configurations into
ftarc.yml
.$ ftarc init $ vi ftarc.yml # => edit
Example of
ftarc.yml
:--- reference_name: hs38DH adapter_removal: true metrics_collectors: fastqc: true picard: true samtools: true resources: reference_fa: /path/to/GRCh38_full_analysis_set_plus_decoy_hla.fa known_sites_vcf: - /path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz - /path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz - /path/to/Homo_sapiens_assembly38.known_indels.vcf.gz runs: - fq: - /path/to/sample01.WGS.R1.fq.gz - /path/to/sample01.WGS.R2.fq.gz - fq: - /path/to/sample02.WGS.R1.fq.gz - /path/to/sample02.WGS.R2.fq.gz - fq: - /path/to/sample03.WGS.R1.fq.gz - /path/to/sample03.WGS.R2.fq.gz read_group: ID: FLOWCELL-1 PU: UNIT-1 SM: sample03 PL: ILLUMINA LB: LIBRARY-1
-
Create analysis-ready CRAM files from FASTQ files
$ ftarc pipeline --yml=ftarc.yml --workers=2
Standard workflow:
- Trim adapters
trim_galore
- Map reads to a human reference genome
bwa mem
(orbwa-mem2 mem
)
- Mark duplicates
gatk MarkDuplicates
gatk SetNmMdAndUqTags
- Apply BQSR (Base Quality Score Recalibration)
gatk BaseRecalibrator
gatk ApplyBQSR
- Remove duplicates
samtools view
- Validate output CRAM files
gatk ValidateSamFile
- Collect QC metrics
fastqc
samtools
gatk
- Trim adapters
Preprocessing and QC-check
-
Validate BAM or CRAM files using Picard
$ ftarc validate /path/to/genome.fa /path/to/aligned.cram
-
Collect metrics from FASTQ files using FastQC
$ ftarc fastqc read1.fq.gz read2.fq.gz
-
Collect metrics from FASTQ files using FastQC
$ ftarc samqc /path/to/genome.fa /path/to/aligned.cram
-
Apply BQSR to BAM or CRAM files using GATK
$ ftarc bqsr \ --known-sites-vcf=/path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz \ --known-sites-vcf=/path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \ --known-sites-vcf=/path/to/Homo_sapiens_assembly38.known_indels.vcf.gz \ /path/to/genome.fa /path/to/markdup.cram
-
Remove duplicates in marked BAM or CRAM files
$ ftarc dedup /path/to/genome.fa /path/to/markdup.cram
Run ftarc --help
for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ftarc-0.2.4.tar.gz
.
File metadata
- Download URL: ftarc-0.2.4.tar.gz
- Upload date:
- Size: 23.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcf36aefebbae1c88112f97387b54f7ed8ced5ee4034f90a5e6c3230ad9c8ce5 |
|
MD5 | d90ec190a449a06fe6a202a3a1814fe1 |
|
BLAKE2b-256 | ec775da537a87b52eb61779590d9956c98fa976df960eac5b54e1fbea29dc080 |
File details
Details for the file ftarc-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: ftarc-0.2.4-py3-none-any.whl
- Upload date:
- Size: 30.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f17230f737c105b115defdbd70618fdf0953da664cd5d21453eca318ecba2489 |
|
MD5 | e8e489cb715e5e4b27fa6f1dd6e64572 |
|
BLAKE2b-256 | ed41a63ebc081c5684f99a041930550cf40f40077e2ce300c5b218fd3de352e9 |