Skip to main content

A structural variant caller for long reads.

Project description

https://img.shields.io/pypi/v/svim?style=flat https://img.shields.io/conda/vn/bioconda/svim?style=flat https://img.shields.io/conda/dn/bioconda/svim?label=bioconda%20downloads&style=flat https://img.shields.io/badge/published%20in-Bioinformatics-blue.svg

SVIM (pronounced SWIM) is a structural variant caller for long sequencing reads. It is able to detect, classify and genotype five different classes of structural variants. Unlike existing methods, SVIM integrates information from across the genome to precisely distinguish similar events, such as tandem and interspersed duplications and simple insertions. In our experiments on simulated data and real datasets from PacBio and Nanopore sequencing machines, SVIM reached consistently better results than competing methods.

Note! To analyze haploid or diploid genome assemblies or contigs, please use our other method SVIM-asm.

Background on Structural Variants and Long Reads

https://raw.githubusercontent.com/eldariont/svim/master/docs/SVclasses.png

Structural variants (SVs) are typically defined as genomic variants larger than 50bps (e.g. deletions, duplications, inversions). Studies have shown that they affect more bases in an average genome than SNPs and small Indels together. Consequently, they have a large impact on genes and regulatory regions. This is reflected in the large number of genetic disorders and other disease that are associated to SVs.

Common sequencing technologies by providers such as Illumina generate short reads with high accuracy. However, they exhibit weaknesses in repeat and low-complexity regions where SVs are particularly common. Single molecule long-read sequencing technologies from Pacific Biotechnologies and Oxford Nanopore produce reads with error rates of up to 15% but with lengths of several kbps. The high read lengths enable them to cover entire repeats and SVs which facilitates SV detection.

Installation

#Install via conda into a new environment (recommended): installs all dependencies including read alignment dependencies
conda create -n svim_env --channel bioconda svim

#Install via conda into existing (active) environment: installs all dependencies including read alignment dependencies
conda install --channel bioconda svim

#Install via pip (requires Python 3.6.* or newer): installs all dependencies except those necessary for read alignment (ngmlr, minimap2, samtools)
pip install svim

#Install from github (requires Python 3.6.* or newer): installs all dependencies except those necessary for read alignment (ngmlr, minimap2, samtools)
git clone https://github.com/eldariont/svim.git
cd svim
pip install .

Changelog

  • v1.4.2: fix invalid start coordinates in VCF output, issue warning for invalid characters in contig names

  • v1.4.1: improve clustering of translocation breakpoints (BNDs), improve –all_bnds mode, bugfixes

  • v1.4.0: fix and improve clustering of insertions, add option –all_bnds to output all SV classes in breakend notation, update default value of –partition_max_distance to avoid very large partitions, bugfixes

  • v1.3.1: small changes to partitioning and clustering algorithm, add two new command-line options to output duplications as INS records in VCF, remove limit on number of supplementary alignments, remove q5 filter, bugfixes

  • v1.3.0: improve BND detection, add INFO:ZMWS tag with number of supporting PacBio wells, add sequence alleles for INS, add FORMAT:CN tag for tandem duplications, bugfixes

  • v1.2.0: add 3 more VCF output options: output sequence instead of symbolic alleles in VCF, output names of supporting reads, output insertion sequences of supporting reads

  • v1.1.0: outputs BNDs in VCF, detects large tandem duplications, allows skipping genotyping, makes VCF output more flexible, adds genotype scatter plot

  • v1.0.0: adds genotyping of deletions, inversions, insertions and interspersed duplications, produces plots of SV length distribution, improves help descriptions

  • v0.5.0: replaces graph-based clustering with hierarchical clustering, modifies scoring function, improves partitioning prior to clustering, improves calling from coordinate-sorted SAM/BAM files, improves VCF output

  • v0.4.4: includes exception message into log files, bug fixes, adds tests and sets up Travis

  • v0.4.3: adds support for coordinate-sorted SAM/BAM files, improves VCF output and increases compatibility with IGV and truvari, bug fixes

Input

SVIM analyzes long reads given as a FASTA/FASTQ file (uncompressed or gzipped) or a file list. Alternatively, it can analyze an alignment file in BAM format. SVIM has been successfully tested on PacBio CLR, PacBio CCS and Oxford Nanopore data. It works best for alignment files produced by NGMLR but also supports the faster read mapper minimap2.

Output

SVIM’s main output file called variants.vcf (formerly final_results.vcf) is placed into the given working directory. For each of the five detected SV classes, SVIM also produces a BED file with the SV coordinates in the candidates subdirectory.

Usage

Please see our wiki.

Contact

If you experience problems or have suggestions please create an issue or a pull request or contact heller_d@molgen.mpg.de.

Citation

Feel free to read and cite our paper in Bioinformatics: https://doi.org/10.1093/bioinformatics/btz041

License

The project is licensed under the GNU General Public License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svim-1.4.2.tar.gz (250.9 kB view details)

Uploaded Source

Built Distribution

svim-1.4.2-py3-none-any.whl (73.5 kB view details)

Uploaded Python 3

File details

Details for the file svim-1.4.2.tar.gz.

File metadata

  • Download URL: svim-1.4.2.tar.gz
  • Upload date:
  • Size: 250.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for svim-1.4.2.tar.gz
Algorithm Hash digest
SHA256 d70532ace5e278b4094c6f541692c0ae2c8303c7ea78795e0c1db97bd4c4bca0
MD5 83b4f83125f00e6f8d384aeeb6e283ae
BLAKE2b-256 35e8b614684592fb01c17f936fa9948710bc262df1796920a64e774c0a974cc8

See more details on using hashes here.

File details

Details for the file svim-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: svim-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 73.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4

File hashes

Hashes for svim-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dfa3bea1d097439419741bd0e21ba1540338a0c7e0a0922b05eb3d3686f524cd
MD5 7c016bcc646671098f04e0422a4aa26c
BLAKE2b-256 2a4f04905a01c52f7979e2b2f005b48d97ce587a6619f6761af9da17b860e58b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page