Structural variant analyzer for data visualization on VariantMap
Project description
VariantBreak - Structural variant analyzer for data visualization on VariantMap
VariantBreak is a python package that integrates all structural variants (SVs) from a cohort of NanoVar VCF files or variant BED files for visualization on [VariantMap](https://github .com/cytham/variantmap) or summarized into a CSV file. It also annotates and filters all SVs across all samples according to user input GTF/GFF/BED files.
Basic capabilities
- Intersects and merges all SV breakends from a sample cohort using NanoVar VCF files (NanoVar-v1.3.6 or above) or variant BED files.
- Annotates each SV according to input GTF/GFF files or BED annotation files.
- Filters SVs by adding a "HIT" or "MISS" label according to input BED filter files.
- Creates a master pandas dataframe to store all data.
- Creates a HDF5 file containing the master dataframe and some metadata which can be graphically visualized on VariantMap within Dash Bio.
Getting Started
Quick run
Command-line usage:
variantbreak [Options] -a annotation.gff3 -f filter.bed variant_path working_dir
Parameter | Argument | Comment |
---|---|---|
-a |
annotation.gff3 | path to single annotation file or directory containing annotation files of GTF/GFF or BED formats |
-f |
filter.bed | path to single filter file or directory containing filter files of BED format |
- | variant_path | path to single variant file or directory containing variant files of VCF or BED formats |
- | working_dir | path to working directory |
Python console usage:
# Import variantbreak function from variantbreak package
from variantbreak import variantbreak
# Run variantbreak on your samples with annotation and filter files
df = variantbreak("/path/to/sample_dir/",
"/path/to/annotation_dir/",
"/path/to/filter_dir/")
# To save data to files
# Import write_to_file from variantbreak package
from variantbreak import write_to_files
# Specify dataframe variable, output file path and prefix, and delimiter of choice
write_to_files(df,
"/path/to/output_prefix",
sep="\t")
Output
Output file | Comment |
---|---|
output.h5 | HDF5 file required for data visualization by VariantMap |
output.csv | CSV file for data viewing, separated by the delimiter set by user |
legend.txt | File containing the legend of the sample labels used in analysis |
For more information, see wiki.
Operating system:
- Linux (x86_64 architecture, tested in Ubuntu 16.04)
Installation:
There are three ways to install VariantBreak:
Option 1: Conda (Recommended)
# Installing from bioconda automatically installs all dependencies
conda install -c bioconda variantbreak
Option 2: Pip (See dependencies below)
# Installing from PyPI requires own installation of dependencies, see below
pip install variantbreak
Option 3: GitHub (See dependencies below)
# Installing from GitHub requires own installation of dependencies, see below
git clone https://github.com/cytham/variantbreak.git
cd variantbreak
pip install .
Installation of dependencies
- bedtools >=2.26.0 (required to be in PATH by pybedtools)
- pybedtools >=0.8.1
- pandas >=1.0.3
- tables >=3.6.1
- fastcluster >=1.1.26
1. bedtools
Please visit here for instructions to install.
2. pybedtools
Please visit here for instructions to install.
3. pandas
Please visit here for instructions to install.
4. tables
pip install tables
or
conda install -c conda-forge pytables
5. fastcluster
pip install fastcluster
or
conda install -c conda-forge fastcluster
Documentation
See wiki for more information.
Versioning
See CHANGELOG
Citation
Not available
Author
- Tham Cheng Yong - cytham
License
VariantBreak is licensed under GNU General Public License - see LICENSE.txt for details.
Limitations
-
Current version only allows input of VCF files generated by NanoVar. We will create a format adaptor in future versions to encompass VCF files generated by other SV callers.
-
Processing speed of large sample cohorts has not been tested. Currently, it takes about 30 minutes to process about 100,000 merged SVs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file variantbreak-1.0.4.tar.gz
.
File metadata
- Download URL: variantbreak-1.0.4.tar.gz
- Upload date:
- Size: 29.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.9.1 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc84229881781f915af23438401a0d63275d5fbee97714d65ee04a66606ccca0 |
|
MD5 | 94bd7b68117e2e50fdd2feeaa2c544f0 |
|
BLAKE2b-256 | 75afacbf84b4a993081bcf0c297f17cc10841c8c57d5f3b6ab37e13e055b9f6a |