Skip to main content

FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing

Project description

ftarc

FASTQ-to-analysis-ready-CRAM Workflow Executor for Human Genome Sequencing

Test Upload Python Package CI to Docker Hub

Installation

$ pip install -U ftarc

Dependent commands:

  • pigz
  • pbzip2
  • bgzip
  • tabix
  • samtools (and plot-bamstats)
  • gnuplot
  • java
  • gatk
  • cutadapt
  • fastqc
  • trim_galore
  • bwa or bwa-mem2

Docker image

Pull the image from Docker Hub.

$ docker image pull dceoy/ftarc

Usage

Create analysis-ready CRAM files from FASTQ files

input files output files
read1/read2 FASTQ (Illumina) analysis-ready CRAM
  1. Download hg38 resource data.

    $ ftarc download --dest-dir=/path/to/download/dir
    
  2. Write input file paths and configurations into ftarc.yml.

    $ ftarc init
    $ vi ftarc.yml  # => edit
    

    Example of ftarc.yml:

    ---
    reference_name: hs38DH
    adapter_removal: true
    metrics_collectors:
      fastqc: true
      picard: true
      samtools: true
    resources:
      ref_fa: /path/to/GRCh38_full_analysis_set_plus_decoy_hla.fa
      known_sites_vcf:
        - /path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz
        - /path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
        - /path/to/Homo_sapiens_assembly38.known_indels.vcf.gz
    runs:
      - fq:
          - /path/to/sample01.WGS.R1.fq.gz
          - /path/to/sample01.WGS.R2.fq.gz
      - fq:
          - /path/to/sample02.WGS.R1.fq.gz
          - /path/to/sample02.WGS.R2.fq.gz
      - fq:
          - /path/to/sample03.WGS.R1.fq.gz
          - /path/to/sample03.WGS.R2.fq.gz
        read_group:
          ID: FLOWCELL-1
          PU: UNIT-1
          SM: sample03
          PL: ILLUMINA
          LB: LIBRARY-1
    
  3. Create analysis-ready CRAM files from FASTQ files

    $ ftarc pipeline --yml=ftarc.yml --workers=2
    

    Standard workflow:

    1. Trim adapters
      • trim_galore
    2. Map reads to a human reference genome
      • bwa mem (or bwa-mem2 mem)
    3. Mark duplicates
      • gatk MarkDuplicates
      • gatk SetNmMdAndUqTags
    4. Apply BQSR (Base Quality Score Recalibration)
      • gatk BaseRecalibrator
      • gatk ApplyBQSR
    5. Remove duplicates
      • samtools view
    6. Validate output CRAM files
      • gatk ValidateSamFile
    7. Collect QC metrics
      • fastqc
      • samtools
      • gatk

Preprocessing and QC-check

  • Validate BAM or CRAM files using Picard

    $ ftarc validate /path/to/genome.fa /path/to/aligned.cram
    
  • Collect metrics from FASTQ files using FastQC

    $ ftarc fastqc read1.fq.gz read2.fq.gz
    
  • Collect metrics from FASTQ files using FastQC

    $ ftarc samqc /path/to/genome.fa /path/to/aligned.cram
    
  • Apply BQSR to BAM or CRAM files using GATK

    $ ftarc bqsr \
        --known-sites=/path/to/Homo_sapiens_assembly38.dbsnp138.vcf.gz \
        --known-sites=/path/to/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
        --known-sites=/path/to/Homo_sapiens_assembly38.known_indels.vcf.gz \
        /path/to/genome.fa /path/to/markdup.cram
    
  • Remove duplicates in marked BAM or CRAM files

    $ ftarc dedup /path/to/genome.fa /path/to/markdup.cram
    

Run ftarc --help for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ftarc-0.2.0.tar.gz (23.3 kB view hashes)

Uploaded Source

Built Distribution

ftarc-0.2.0-py3-none-any.whl (30.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page