Skip to main content

FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing

Project description

ftarc

FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing

Test Upload Python Package CI to Docker Hub

Installation

$ pip install -U ftarc

Dependent commands:

  • pigz
  • pbzip2
  • bgzip
  • tabix
  • samtools (and plot-bamstats)
  • gnuplot
  • java
  • gatk
  • cutadapt
  • fastqc
  • trim_galore
  • bwa or bwa-mem2

Docker image

Pull the image from Docker Hub.

$ docker image pull dceoy/ftarc

Usage

Create analysis-ready CRAM files from FASTQ files

input files output files
read1/read2 FASTQ (Illumina) analysis-ready CRAM
  1. Download hg38 resource data.

    $ ftarc download --dest-dir=/path/to/download/dir
    
  2. Write input file paths and configurations into ftarc.yml.

    $ ftarc init
    $ vi ftarc.yml  # => edit
    

    Example of ftarc.yml:

    ---
    reference_name: hs38DH
    adapter_removal: true
    metrics_collectors:
      fastqc: true
      picard: true
      samtools: true
    resources:
      reference_fa: /path/to/GRCh38_full_analysis_set_plus_decoy_hla.fa
      known_sites_vcf:
        - /path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz
        - /path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
        - /path/to/Homo_sapiens_assembly38.known_indels.vcf.gz
    runs:
      - fq:
          - /path/to/sample01.WGS.R1.fq.gz
          - /path/to/sample01.WGS.R2.fq.gz
      - fq:
          - /path/to/sample02.WGS.R1.fq.gz
          - /path/to/sample02.WGS.R2.fq.gz
      - fq:
          - /path/to/sample03.WGS.R1.fq.gz
          - /path/to/sample03.WGS.R2.fq.gz
        read_group:
          ID: FLOWCELL-1
          PU: UNIT-1
          SM: sample03
          PL: ILLUMINA
          LB: LIBRARY-1
    
  3. Create analysis-ready CRAM files from FASTQ files

    $ ftarc pipeline --yml=ftarc.yml --workers=2
    

    Standard workflow:

    1. Trim adapters
      • trim_galore
    2. Map reads to a human reference genome
      • bwa mem (or bwa-mem2 mem)
    3. Mark duplicates
      • gatk MarkDuplicates
      • gatk SetNmMdAndUqTags
    4. Apply BQSR (Base Quality Score Recalibration)
      • gatk BaseRecalibrator
      • gatk ApplyBQSR
    5. Remove duplicates
      • samtools view
    6. Validate output CRAM files
      • gatk ValidateSamFile
    7. Collect QC metrics
      • fastqc
      • samtools
      • gatk

Preprocessing and QC-check

  • Validate BAM or CRAM files using Picard

    $ ftarc validate /path/to/genome.fa /path/to/aligned.cram
    
  • Collect metrics from FASTQ files using FastQC

    $ ftarc fastqc read1.fq.gz read2.fq.gz
    
  • Collect metrics from FASTQ files using FastQC

    $ ftarc samqc /path/to/genome.fa /path/to/aligned.cram
    
  • Apply BQSR to BAM or CRAM files using GATK

    $ ftarc bqsr \
        --known-sites-vcf=/path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz \
        --known-sites-vcf=/path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
        --known-sites-vcf=/path/to/Homo_sapiens_assembly38.known_indels.vcf.gz \
        /path/to/genome.fa /path/to/markdup.cram
    
  • Remove duplicates in marked BAM or CRAM files

    $ ftarc dedup /path/to/genome.fa /path/to/markdup.cram
    

Run ftarc --help for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftarc-0.2.4.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

ftarc-0.2.4-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file ftarc-0.2.4.tar.gz.

File metadata

  • Download URL: ftarc-0.2.4.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10

File hashes

Hashes for ftarc-0.2.4.tar.gz
Algorithm Hash digest
SHA256 fcf36aefebbae1c88112f97387b54f7ed8ced5ee4034f90a5e6c3230ad9c8ce5
MD5 d90ec190a449a06fe6a202a3a1814fe1
BLAKE2b-256 ec775da537a87b52eb61779590d9956c98fa976df960eac5b54e1fbea29dc080

See more details on using hashes here.

File details

Details for the file ftarc-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: ftarc-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10

File hashes

Hashes for ftarc-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f17230f737c105b115defdbd70618fdf0953da664cd5d21453eca318ecba2489
MD5 e8e489cb715e5e4b27fa6f1dd6e64572
BLAKE2b-256 ed41a63ebc081c5684f99a041930550cf40f40077e2ce300c5b218fd3de352e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page