Skip to main content

Correct misassemblies using linked reads

Project description

Correct misassemblies using linked reads

Cut sequences at positions with few spanning molecules.

Written by Shaun Jackman, Lauren Coombe, and Justin Chu.

bioRxiv doi:10.1101/304253 · Slides · Poster

Description

Tigmint identifies and corrects misassemblies using linked reads from 10x Genomics Chromium. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to coverage dropouts than that of the short read sequencing data. The sequences are cut at positions that have insufficient spanning molecules. Tigmint outputs a BED file of these cut points, and a FASTA file of the cut sequences.

Each window of a specified fixed size is checked for a minimum number of spanning molecules. Sequences are cut at those positions where a window with sufficient coverage is followed by some number of windows with insufficient coverage is then followed again by a window with sufficient coverage.

Installation

Install Tigmint using Brew

Install Linuxbrew on Linux or Windows Subsystem for Linux (WSL), or install Homebrew on macOS, and then run the command

brew install tigmint

Install Tigmint from the source code

Download and extract the source code. Compiling is not needed.

git clone https://github.com/bcgsc/tigmint && cd tigmint

or

curl -L https://github.com/bcgsc/tigmint/archive/master.tar.gz | tar xz && mv tigmint-master tigmint && cd tigmint

Dependencies

Install Python package dependencies

pip3 install intervaltree pybedtools pysam statistics

Tigmint uses Bedtools, BWA and Samtools. These dependencies may be installed using Homebrew on macOS or Linuxbrew on Linux.

Install the dependencies of Tigmint

brew install bedtools bwa samtools

Install the dependencies of ARCS (optional)

brew tap brewsci/bio
brew install arcs links-scaffolder

Install the dependencies for calculating assembly metrics (optional)

brew install abyss seqtk

Usage

To run Tigmint on the draft assembly draft.fa with the reads reads.fq.gz, which have been run through longranger basic:

samtools faidx draft.fa
bwa index draft.fa
bwa mem -t8 -p -C draft.fa reads.fq.gz | samtools sort -@8 -tBX -o draft.reads.sortbx.bam
tigmint-molecule draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.molecule.bed
tigmint-cut -p8 -o draft.tigmint.fa draft.fa draft.reads.molecule.bed
  • bwa mem -C is used to copy the BX tag from the FASTQ header to the SAM tags.
  • samtools sort -tBX is used to sort first by barcode and then position.

Alternatively, you can run the Tigmint pipeline using the Makefile driver script tigmint-make. To run Tigmint on the draft assembly myassembly.fa with the reads myreads.fq.gz, which have been run through longranger basic:

tigmint-make tigmint draft=myassembly reads=myreads

To run both Tigmint and scaffold the corrected assembly with ARCS:

tigmint-make arcs draft=myassembly reads=myreads

To run Tigmint, ARCS, and calculate assembly metrics using the reference genome GRCh38.fa:

tigmint-make metrics draft=myassembly reads=myreads ref=GRCh38 G=3088269832

Note

  • tigmint-make is a Makefile script, and so any make options may also be used with tigmint-make, such as -n (--dry-run).
  • The file extension of the assembly must be .fa and the reads .fq.gz, and the extension is not included in the parameters draft and reads. These specific file name requirements result from implementing the pipeline in GNU Make.

tigmint-make commands

  • tigmint: Run Tigmint, and produce a file named $draft.tigmint.fa
  • arcs: Run Tigmint and ARCS, and produce a file name $draft.tigmint.arcs.fa
  • metrics: Run, Tigmint, ARCS, and calculate assembly metrics using abyss-fac and abyss-samtobreak, and produce TSV files.

Parameters of Tigmint

  • draft: Name of the draft assembly, draft.fa
  • reads: Name of the reads, reads.fq.gz
  • span=20: Number of spanning molecules threshold
  • window=1000: Window size (bp) for checking spanning molecules
  • minsize=2000: Minimum molecule size
  • as=0.65: Minimum AS/read length ratio
  • nm=5: Maximum number of mismatches
  • dist=50000: Maximum distance (bp) between reads to be considered the same molecule
  • mapq=0: Mapping quality threshold
  • trim=0: Number of bases to trim off contigs following cuts
  • t=8: Number of threads

Parameters of ARCS

  • c=5
  • e=30000
  • r=0.05

Parameters of LINKS

  • a=0.1
  • l=10

Parameters for calculating assembly metrics

  • ref: Reference genome, ref.fa, for calculating assembly contiguity metrics
  • G: Size of the reference genome, for calculating NG50 and NGA50

Tips

  • If your barcoded reads are in multiple FASTQ files, the initial alignments of the barcoded reads to the draft assembly can be done in parallel and merged prior to running Tigmint.
  • When aligning with BWA-MEM, use the -C option to include the barcode in the BX tag of the alignments.
  • Sort by BX tag using samtools sort -tBX.
  • Merge multiple BAM files using samtools merge -tBX.

Support

After first looking for existing issue at https://github.com/bcgsc/tigmint/issues, please report a new issue at https://github.com/bcgsc/tigmint/issues/new. Please report the names of your input files, the exact command line that you are using, and the entire output of Tigmint.

Pipeline

Tigmint pipeline illustration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tigmint-1.1.2.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

tigmint-1.1.2-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file tigmint-1.1.2.tar.gz.

File metadata

  • Download URL: tigmint-1.1.2.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.7.0

File hashes

Hashes for tigmint-1.1.2.tar.gz
Algorithm Hash digest
SHA256 17ea87a303e3f4eb0e466c21730a3e3c78bc328b64a4f314f9e53a887b3460f4
MD5 e8127911baef8fe547c390ba5b75bb44
BLAKE2b-256 f3708f92c45efef77a6b82f357be0ec7238a4bb0cf636950d6c49d06fa6475dc

See more details on using hashes here.

File details

Details for the file tigmint-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: tigmint-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.7.0

File hashes

Hashes for tigmint-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2dadb09a1ab651472c5e17ff1db01592cc1c953cf06cbbca89ca5e0cc474806c
MD5 b68ecd45c90719152579b2d97f5455ec
BLAKE2b-256 333bd9cbe294e7c895d7b8a4befa10408f58aa630ae7f9114db3d06474d51a83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page