Skip to main content

OctopuSV: Advanced Structural Variant Analysis Toolkit

Project description

OctopuSV: Advanced structural variant analysis toolkit 🐙

PyPI version Bioconda License: MIT

[!IMPORTANT] Always use the latest version for best results.

conda install bioconda::octopusv

[!NOTE] Native GRIDSS support (v0.3.1+): OctopuSV directly processes GRIDSS VCF output through octopusv correct. Paired BND records are resolved to standard SV types (DEL/DUP/INV/INS/TRA) using the same logic as GRIDSS's official simple-event-annotation.R — including automatic INS detection from BND pairs with inserted sequences. Single breakends are safely skipped. No pre-processing with StructuralVariantAnnotation or other external tools required.


OctopuSV addresses four key challenges in structural variant (SV) analysis:

  1. Smart BND standardization — Converts paired BND records into standard SV types (DEL/INV/DUP/INS/TRA), while preserving potential complex rearrangements as BNDs. Works out of the box with BND-heavy callers like GRIDSS and SvABA.
  2. Multi-caller integration — Merge SVs from different tools (Manta, Sniffles, GRIDSS, PBSV, etc.) with flexible Boolean logic.
  3. Multi-sample integration — Compare and analyze SVs across cohorts with customizable sample-level merging.
  4. Somatic variant calling — Extract tumor-specific SVs by comparing tumor vs normal samples. Works with any SV caller, even those not designed for cancer analysis.

Whether you're analyzing single samples, cohorts, or tumor/normal pairs, OctopuSV standardizes your workflow from raw calls to publication-ready results.


How OctopuSV Works

OctopuSV uses a standardized workflow to handle VCF inconsistencies across different SV callers:

  1. Standardize: Convert any SV caller output to SVCF format using octopusv correct
  2. Analyze: Perform merging, comparison, or somatic calling on standardized SVCF files
  3. Export: Convert results back to standard VCF using octopusv svcf2vcf

Why SVCF? Different SV callers implement VCF inconsistently — varying field names, BND notations, coordinate systems. SVCF eliminates these compatibility issues by providing a unified intermediate format.

# Step 1: Standardize caller outputs
octopusv correct manta_output.vcf manta.svcf
octopusv correct gridss_output.vcf gridss.svcf
octopusv correct sniffles_output.vcf sniffles.svcf

# Step 2: Analyze with consistent format
octopusv merge -i manta.svcf gridss.svcf sniffles.svcf -o merged.svcf --intersect
octopusv somatic -t tumor.svcf -n normal.svcf -o somatic.svcf

# Step 3: Convert back to standard VCF
octopusv svcf2vcf -i merged.svcf -o final_results.vcf

📋 SVCF Format Details: See our SVCF specification document for technical details.


Supported SV Callers

Long-read: Sniffles, Severus, SVDSS, DeBreak, SVIM, CuteSV, PBSV, nanomonsv

Short-read: Manta, Delly, GRIDSS, Lumpy, SvABA, Octopus, CLEVER

CNV callers: Dragen CNV (automatic conversion of CNV to DEL/DUP)

Continuously expanding support for additional callers.


Installation

Bioconda (recommended)

conda install bioconda::octopusv

Or with mamba for faster dependency resolution:

mamba install bioconda::octopusv

PyPI

pip install octopusv

Docker

docker pull quay.io/biocontainers/octopusv:<tag>

See octopusv/tags for valid values.


Quick Start

1. Correct and Standardize BND Annotations

octopusv correct converts raw SV caller output into standardized SVCF format. This includes resolving paired BND records into concrete SV types and detecting insertions from BND pairs with long inserted sequences (e.g., from GRIDSS).

# Basic correction
octopusv correct input.vcf output.svcf

# With position tolerance control (for BND pairing)
octopusv correct -i input.vcf -o output.svcf --pos-tolerance 5

# Apply quality filters
octopusv correct -i input.vcf -o output.svcf --min-svlen 50 --max-svlen 100000 --filter-pass

2. Merge SV Calls (Multi-caller or Multi-sample)

# Intersection: SVs found by ALL callers
octopusv merge -i manta.svcf sniffles.svcf pbsv.svcf -o intersection.svcf --intersect

# Union: SVs found by ANY caller
octopusv merge -i caller1.svcf caller2.svcf caller3.svcf -o union.svcf --union

# Specific caller: SVs unique to one caller
octopusv merge -i manta.svcf sniffles.svcf -o manta_specific.svcf --specific manta.svcf

# Minimum support: SVs supported by at least N callers
octopusv merge -i a.svcf b.svcf c.svcf d.svcf -o supported.svcf --min-support 3

# Complex Boolean logic: (A AND B) but NOT (C OR D)
octopusv merge -i A.svcf B.svcf C.svcf D.svcf \
  --expression "(A AND B) AND NOT (C OR D)" -o filtered.svcf

# Multi-sample mode with custom names
octopusv merge -i sample1.svcf sample2.svcf sample3.svcf \
  --mode sample --sample-names Patient1,Patient2,Patient3 \
  --min-support 2 -o cohort.svcf

# Generate intersection plot
octopusv merge -i a.svcf b.svcf c.svcf -o merged.svcf --intersect \
  --upsetr --upsetr-output venn_diagram.png

3. Somatic SV Calling

Use any SV caller to analyze tumor and normal samples separately, then let OctopuSV find somatic variants. Works even with callers not designed for cancer analysis.

# Basic somatic calling
octopusv somatic -t tumor.svcf -n normal.svcf -o somatic.svcf

# With custom matching parameters
octopusv somatic -t tumor.svcf -n normal.svcf -o somatic.svcf \
  --max-distance 100 --min-jaccard 0.8

# Convert to standard VCF for downstream analysis
octopusv svcf2vcf -i somatic.svcf -o somatic.vcf

Example multi-caller somatic workflow (e.g., with 3 callers on a tumor-normal pair):

# Run each caller separately on the tumor-normal pair, then standardize
octopusv correct manta_tumor.vcf manta_tumor.svcf
octopusv correct delly_tumor.vcf delly_tumor.svcf
octopusv correct gridss_tumor.vcf gridss_tumor.svcf

# Keep SVs supported by at least 2 out of 3 callers
octopusv merge -i manta_tumor.svcf delly_tumor.svcf gridss_tumor.svcf \
  -o high_confidence_somatic.svcf --min-support 2

4. Benchmark Against Truth Sets

octopusv benchmark truth.vcf calls.svcf \
  -o benchmark_results \
  --reference-distance 500 \
  --size-similarity 0.7 \
  --reciprocal-overlap 0.0 \
  --size-min 50 --size-max 50000

5. Generate Statistics and Visualizations

# Basic stat collection
octopusv stat -i input.svcf -o stats.txt

# Add HTML report
octopusv stat -i input.svcf -o stats.txt --report

# Plot figures from stats
octopusv plot stats.txt -o figure_prefix

The --report flag outputs an interactive HTML report covering SV type and size distributions, chromosome breakdowns, quality score summaries, and genotype and depth features.

6. Format Conversion

# To BED
octopusv svcf2bed -i input.svcf -o output.bed

# To BEDPE
octopusv svcf2bedpe -i input.svcf -o output.bedpe

# To standard VCF
octopusv svcf2vcf -i input.svcf -o output.vcf

Example Visualizations

OctopuSV generates publication-ready visualizations:


Citation

If you use OctopuSV in your research, please cite:

Guo, Qingxiang, Yangyang Li, Ting-You Wang, Abhi Ramakrishnan, and Rendong Yang. "OctopuSV and TentacleSV: a one-stop toolkit for multi-sample, cross-platform structural variant comparison and analysis." Bioinformatics (2025): btaf599. doi: https://doi.org/10.1093/bioinformatics/btaf599

@article{guo2025octopusv,
  title={OctopuSV and TentacleSV: a one-stop toolkit for multi-sample, cross-platform structural variant comparison and analysis},
  author={Guo, Qingxiang and Li, Yangyang and Wang, Ting-You and Ramakrishnan, Abhi and Yang, Rendong},
  journal={Bioinformatics},
  pages={btaf599},
  year={2025},
  publisher={Oxford University Press}
}

If you find OctopuSV useful, a ⭐ on GitHub helps others discover the project!

See the companion pipeline: TentacleSV


Contributing

We welcome issues, suggestions, and pull requests!

git clone https://github.com/ylab-hi/OctopuSV.git
cd OctopuSV
poetry install
pre-commit run -a

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octopusv-0.3.1.tar.gz (256.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

octopusv-0.3.1-py3-none-any.whl (279.5 kB view details)

Uploaded Python 3

File details

Details for the file octopusv-0.3.1.tar.gz.

File metadata

  • Download URL: octopusv-0.3.1.tar.gz
  • Upload date:
  • Size: 256.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.8 Darwin/23.6.0

File hashes

Hashes for octopusv-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c9d8b2355b52bee0bee1c1ba3058bc3400c0df49c02e2c64b7d07f67b7edd98f
MD5 1fa9685df9f2b3fb83251bbd008aa92e
BLAKE2b-256 7ad649ad7e42876cb83bee14781e2ebf0df9f8e52ebd9072814b1016e5124312

See more details on using hashes here.

File details

Details for the file octopusv-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: octopusv-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 279.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.8 Darwin/23.6.0

File hashes

Hashes for octopusv-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4789c07f002c9001effaefa12b66c6f3ee3b9b5661e735a315adc2a59b379cef
MD5 314d22db95ea1740949035b3b2ecf952
BLAKE2b-256 6c3ed2c7818425a9a0c3edcb8565376157a0b5800cfae64571b4f368f74e9f08

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page