Skip to main content

Structural variant analyzer for data visualization on VariantMap

Project description

VariantBreak - Structural variant analyzer for data visualization on VariantMap

Build Status PyPI pyversions PyPI versions Conda Github release PyPI license

VariantBreak is a python package that integrates all structural variants (SVs) from a cohort of NanoVar VCF files or variant BED files for visualization on [VariantMap](https://github .com/cytham/variantmap) or summarized into a CSV file. It also annotates and filters all SVs across all samples according to user input GTF/GFF/BED files.

Basic capabilities

  • Intersects and merges all SV breakends from a sample cohort using NanoVar VCF files (NanoVar-v1.3.6 or above) or variant BED files.
  • Annotates each SV according to input GTF/GFF files or BED annotation files.
  • Filters SVs by adding a "HIT" or "MISS" label according to input BED filter files.
  • Creates a master pandas dataframe to store all data.
  • Creates a HDF5 file containing the master dataframe and some metadata which can be graphically visualized on VariantMap within Dash Bio.

Getting Started

Quick run

Command-line usage:
variantbreak [Options] -a annotation.gff3 -f filter.bed variant_path working_dir 
Parameter Argument Comment
-a annotation.gff3 path to single annotation file or directory containing annotation files of GTF/GFF or BED formats
-f filter.bed path to single filter file or directory containing filter files of BED format
- variant_path path to single variant file or directory containing variant files of VCF or BED formats
- working_dir path to working directory
Python console usage:
# Import variantbreak function from variantbreak package
from variantbreak import variantbreak

# Run variantbreak on your samples with annotation and filter files
df = variantbreak("/path/to/sample_dir/",
                  "/path/to/annotation_dir/",
                  "/path/to/filter_dir/")


# To save data to files
# Import write_to_file from variantbreak package
from variantbreak import write_to_files

# Specify dataframe variable, output file path and prefix, and delimiter of choice
write_to_files(df,
               "/path/to/output_prefix",
               sep="\t")

Output

Output file Comment
output.h5 HDF5 file required for data visualization by VariantMap
output.csv CSV file for data viewing, separated by the delimiter set by user
legend.txt File containing the legend of the sample labels used in analysis

For more information, see wiki.

Operating system:

  • Linux (x86_64 architecture, tested in Ubuntu 16.04)

Installation:

There are three ways to install VariantBreak:

Option 1: Conda (Recommended)

# Installing from bioconda automatically installs all dependencies 
conda install -c bioconda variantbreak

Option 2: Pip (See dependencies below)

# Installing from PyPI requires own installation of dependencies, see below
pip install variantbreak

Option 3: GitHub (See dependencies below)

# Installing from GitHub requires own installation of dependencies, see below
git clone https://github.com/cytham/variantbreak.git 
cd variantbreak
pip install .

Installation of dependencies

  • bedtools >=2.26.0 (required to be in PATH by pybedtools)
  • pybedtools >=0.8.1
  • pandas >=1.0.3
  • tables >=3.6.1
  • fastcluster >=1.1.26
1. bedtools

Please visit here for instructions to install.

2. pybedtools

Please visit here for instructions to install.

3. pandas

Please visit here for instructions to install.

4. tables
pip install tables

or

conda install -c conda-forge pytables
5. fastcluster
pip install fastcluster

or

conda install -c conda-forge fastcluster

Documentation

See wiki for more information.

Versioning

See CHANGELOG

Citation

Not available

Author

License

VariantBreak is licensed under GNU General Public License - see LICENSE.txt for details.

Limitations

  • Current version only allows input of VCF files generated by NanoVar. We will create a format adaptor in future versions to encompass VCF files generated by other SV callers.

  • Processing speed of large sample cohorts has not been tested. Currently, it takes about 30 minutes to process about 100,000 merged SVs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variantbreak-1.0.4.tar.gz (29.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page