nanovar·PyPI

Structural variant caller using low-depth long reads

These details have not been verified by PyPI

Project links

Project description

Please note: Current v1.8.3 not compatible with Tensorflow >= 2.16.0, please downgrade to 2.15.1

pip install tensorflow-cpu==2.15.1

Please see issue here

We are actively working on this, thank you for your understanding.

NanoVar - Structural variant caller using low-depth long-read sequencing

NanoVar is a genomic structural variant (SV) caller that utilizes low-depth long-read sequencing such as Oxford Nanopore Technologies (ONT). It characterizes SVs with using only 4x depth sequencing for homozygous SVs and 8x depth for heterozygous SVs.

Basic capabilities

Performs long-read mapping (Minimap2) and SV discovery in a single pipeline.
Accurately characterizes SVs using long sequencing reads (High SV recall and precision in simulation datasets, overall F1 score >0.9)
Characterizes six classes of SVs including novel-sequence insertion, deletion, inversion, tandem duplication, sequence transposition (TPO) and translocation (TRA).
Requires 4x and 8x sequencing depth for detecting homozygous and heterozygous SVs respectively.
Rapid computational speed (Takes <3 hours to map and analyze 12 gigabases datasets (4x) using 24 CPU threads)
Approximates SV genotype
Identifies full-length LINE and SINE insertions (Marked by "TE=" in the INFO column of VCF file)
Repeat element INS annotation using NanoINSight

Getting Started

Quick run

nanovar [Options] -t 24 -f hg38 sample.fq/sample.bam ref.fa working_dir

Parameter	Argument	Comment
`-t`	num_threads	Indicate number of CPU threads to use
`-f` (Optional)	gap_file (Optional)	Choose built-in gap BED file or specify own file to exclude gap regions in the reference genome. Built-in gap files include: hg19, hg38 and mm10
-	sample.fq/sample.bam/sample.cram	Input long-read FASTA/FASTQ file or mapped BAM/CRAM file
-	ref.fa	Input reference genome in FASTA format
-	working_dir	Specify working directory

See wiki for entire list of options.

Output

Output file	Comment
${sample}.nanovar.pass.vcf	Final VCF filtered output file (1-based)
${sample}.nanovar.pass.report.html	HTML report showing run summary and statistics

For more information, see wiki.

Full usage

usage: nanovar [options] [FASTQ/FASTA/BAM/CRAM] [REFERENCE_GENOME] [WORK_DIRECTORY]

NanoVar is a long-read structural variant (SV) caller.

positional arguments:
  [FASTQ/FASTA/BAM/CRAM]
                        Path to long reads or mapped BAM/CRAM file.
                        Formats: fasta/fa/fa.gzip/fa.gz/fastq/fq/fq.gzip/fq.gz/bam/cram
  [reference_genome]    Path to reference genome in FASTA. Genome indexes created
                        will overwrite indexes created by other aligners such as bwa.
  [work_directory]      Path to work directory. Directory will be created
                        if it does not exist.

options:
  -h, --help            show this help message and exit
  -x str, --data_type str
                        Type of long-read data [ont]
                        ont - Oxford Nanopore Technologies
                        pacbio-clr - Pacific Biosciences CLR
                        pacbio-ccs - Pacific Biosciences CCS
  -f file, --filter_bed file
                        BED file with genomic regions to be excluded [None]
                        (e.g. telomeres and centromeres) Either specify name of in-built
                        reference genome filter (i.e. hg38, hg19, mm10) or provide full
                        path to own BED file.
  --annotate_ins str    Enable annotation of INS with NanoINSight,
                        please specify species of sample [None]
                        Currently supported species are:
                        'human', 'mouse', and 'rattus'.
  -c int, --mincov int  Minimum number of reads required to call a breakend [4]
  -l int, --minlen int  Minimum length of SV to be detected [25]
  -p float, --splitpct float
                        Minimum percentage of unmapped bases within a long read
                        to be considered as a split-read. 0.05<=p<=0.50 [0.05]
  -a int, --minalign int
                        Minimum alignment length for single alignment reads [200]
  -b int, --buffer int  Nucleotide length buffer for SV breakend clustering [50]
  -s float, --score float
                        Score threshold for defining PASS/FAIL SVs in VCF [1.0]
                        Default score 1.0 was estimated from simulated analysis.
  --homo float          Lower limit of a breakend read ratio to classify a homozygous state [0.75]
                        (i.e. Any breakend with homo<=ratio<=1.00 is classified as homozygous)
  --hetero float        Lower limit of a breakend read ratio to classify a heterozygous state [0.35]
                        (i.e. Any breakend with hetero<=ratio<homo is classified as heterozygous)
  --sv_bam_out          Outputs a BAM file containing only SV-supporting reads with
                        their corresponding SV-ID(s) stored in the "nv" tag separated by comma.
  --debug               Run in debug mode
  -v, --version         Show version and exit
  -q, --quiet           Hide verbose
  -t int, --threads int
                        Number of available threads for use [1]
  --model path          Specify path to custom-built model
  --mm path             Specify path to 'minimap2' executable
  --st path             Specify path to 'samtools' executable
  --ma path             Specify path to 'mafft' executable for NanoINSight
  --rm path             Specify path to 'RepeatMasker' executable for NanoINSight

Operating system

Linux (x86_64 architecture, tested in Ubuntu 14.04, 16.04, 18.04)

Installation

There are three ways to install NanoVar:

Option 1: Conda environment (Recommended)

conda create -n myenv -c bioconda python=3.11 samtools bedtools minimap2
conda activate myenv
pip install nanovar

or

conda create -n myenv -c bioconda python=3.11 nanovar
conda activate myenv

Option 2: PyPI (See dependencies below)

# Installing from PyPI requires own installation of dependencies, see below
pip install nanovar

Option 3: GitHub (See dependencies below)

# Installing from GitHub requires own installation of dependencies, see below
git clone https://github.com/cytham/nanovar.git 
cd nanovar 
pip install .

Installation of dependencies

bedtools >=2.26.0
samtools >=1.3.0
minimap2 >=2.17

Please make sure each executable binary is in PATH.

1. bedtools

Please visit here for instructions to install.

2. samtools

Please visit here for instructions to install.

3. minimap2

Please visit here for instructions to install.

Annotating INS variants with NanoINSight

NanoVar allows the concurrent repeat element annotation of INS variants using NanoINSight.

To run NanoINSight, simply add "--annotate_ins [species]" when running NanoVar.

nanovar -t 24 -f hg38 --annotate_ins human sample.bam ref.fa working_dir

To understand NanoINSight output files, please visit its repository here.

Installation of NanoINSight dependencies

NanoINSight requires the installation of MAFFT and RepeatMasker. Please refer to here for instructions on how to install them, or install them through Conda as shown below:

pip install nanoinsight
conda install -c bioconda mafft repeatmasker -y

Note: If encountered "numpy.dtype size changed" tensorflow error while running NanoVar, ensure numpy version is <2.0.0 (i.e. pip install numpy 1.26.4).

Documentation

See wiki for more information.

Versioning

See CHANGELOG

Citation

If you use NanoVar, please cite:

Tham, CY., Tirado-Magallanes, R., Goh, Y. et al. NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing. Genome Biol. 21, 56 (2020). https://doi.org/10.1186/s13059-020-01968-7

Authors

Tham Cheng Yong - cytham
Roberto Tirado Magallanes - rtmag
Asmaa Samy - asmmahmoud
Touati Benoukraf - benoukraflab

License

This project is licensed under GNU General Public License - see LICENSE.txt for details.

Simulation datasets and scripts used in the manuscript

SV simulation datasets used in the manuscript can be downloaded here. Scripts used for simulation dataset generation and tool performance comparison are available here.

Although NanoVar is provided with a universal model and threshold score, instructions required for building a custom neural-network model is available here.

Limitations

The inaccurate basecalling of large homopolymer or low complexity DNA regions may result in the false determination of deletion SVs. We advise the use of up-to-date ONT basecallers such as Dorado to minimize this possibility.
For BND SVs, NanoVar is unable to calculate the actual number of SV-opposing reads (normal reads) at the novel adjacency as there are two breakends from distant locations. It is not clear whether the novel adjacency is derived from both or either breakends in cases of balanced and unbalanced variants, and therefore it is not possible to know which breakend location(s) to consider for counting normal reads. Currently, NanoVar approximates the normal read count by the minimum count from either breakend location. Although this helps in capturing unbalanced BNDs, it might lead to some false positives.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.8.3

Jan 6, 2025

1.8.2

Jan 6, 2025

1.8.1

Sep 29, 2024

1.8.0

Sep 26, 2024

1.7.0

Jun 17, 2024

1.6.2

Apr 29, 2024

1.6.1

Mar 31, 2024

1.6.0

Jan 16, 2024

1.5.1

Jan 5, 2024

1.5.0

Sep 8, 2023

1.4.1

Oct 7, 2021

1.4.0

Sep 8, 2021

1.3.9

Mar 24, 2021

1.3.8

May 24, 2020

1.3.7

May 23, 2020

1.3.6

Apr 17, 2020

1.3.5

Apr 1, 2020

1.3.4

Mar 19, 2020

1.3.2

Mar 4, 2020

1.3.1

Feb 29, 2020

1.2.7

Dec 15, 2019

1.2.6

Nov 28, 2019

1.2.5

Nov 25, 2019

1.2.4

Nov 25, 2019

1.2.3

Nov 25, 2019

1.2.2

Nov 25, 2019

1.2.1

Nov 24, 2019

1.2.0

Nov 21, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanovar-1.8.3.tar.gz (439.9 kB view details)

Uploaded Jan 6, 2025 Source

File details

Details for the file nanovar-1.8.3.tar.gz.

File metadata

Download URL: nanovar-1.8.3.tar.gz
Upload date: Jan 6, 2025
Size: 439.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for nanovar-1.8.3.tar.gz
Algorithm	Hash digest
SHA256	`3675320ddd27952db16c395f01e9cbd8405bd663be34a1de6951907f47a6111e`
MD5	`d2c7af9b6b386e73329446c4f7d928ac`
BLAKE2b-256	`ea0f29fd3d7f9108b630783d6c64d1c854a67aea0dfb184eb0ff9c03031557e6`

See more details on using hashes here.

nanovar 1.8.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Please note: Current v1.8.3 not compatible with Tensorflow >= 2.16.0, please downgrade to 2.15.1

NanoVar - Structural variant caller using low-depth long-read sequencing

Basic capabilities

Getting Started

Quick run

Output

Full usage

Operating system

Installation

Option 1: Conda environment (Recommended)

Option 2: PyPI (See dependencies below)

Option 3: GitHub (See dependencies below)

Installation of dependencies

1. bedtools

2. samtools

3. minimap2

Annotating INS variants with NanoINSight

Installation of NanoINSight dependencies

Documentation

Versioning

Citation

Authors

License

Simulation datasets and scripts used in the manuscript

Limitations

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes