Skip to main content

SV/CSV callers

Project description

<img src=”https://github.com/xjtu-omics/SVision/tree/master/supports/svision-logo.png” alt=”svision_logo” width=”30%” height=”30%” align=center/>

SVision is a deep learning-based structural variants caller that takes aligned reads or contigs as input. Especially, SVision implements a targeted multi-objects recognition framework, detecting and characterizing both simple and complex structural variants from three-channel similarity images.

<img src=”https://github.com/xjtu-omics/SVision/tree/master/supports/workflow.png” alt=”SVision workflow” width=”60%” height=”60%” align=center/>

## License

SVision is free for non-commercial use by academic, government, and non-profit/not-for-profit institutions. A commercial version of the software is available and licensed through Xi’an Jiaotong University. For more information, please contact with Jiadong Lin (jiadong324@stu.xjtu.edu.cn) or Kai Ye (kaiye@xjtu.edu.cn).

## Install

Step1: Create a python environment with conda

` conda create -n svision-env python=3.6 ` Step2: Install required packages of specific versions

` conda install -c anaconda pysam==0.16.0 conda install -c conda-forge opencv==4.5.1 conda install -c conda-forge tensorflow==1.14.0 ` Step3: Install SVision from PyPI

` pip install SVision `

(Optional) Install from source code

` git clone https://github.com/xjtu-omics/SVision.git cd SVision python setup.py install `

## Usage

` SVision [parameters] -o <output path> -b <input bam path> -g <reference> -m <model path> `

Please check the [wiki](https://github.com/xjtu-omics/SVision/wiki) page for more usage details.

#### Input/output parameters

` -o OUT_PATH Absolute path to output -b BAM_PATH Absolute path to bam file -m MODEL_PATH Absolute path to CNN predict model -g GENOME Absolute path to your reference genome (.fai required in the directory) -n SAMPLE Name of the BAM sample name `

`-g` path to the reference genome, the index file should under the same directory.

`-m` path to the pre-trained deep learning model ([download link](https://drive.google.com/drive/folders/1j74IN6kPKEx9hy3aENx3zHYPUnyYWGvj?usp=sharing)).

#### General parameters ` -t THREAD_NUM Thread numbers [1] -s MIN_SUPPORT Min support read number for an SV [1] -c CHROM Specific region to detect, format: chr1:xxx-xxx or 1:xxx-xxx --hash_table Activate hash table to align unmapped sequences --cluster_callset Cluster original callset to merge uncovered event --report_mechanism Report mechanisms for DEL event --report_graph Report graph for events --contig Activate contig mode `

`--hash_table` enables the image subtraction process, which is activated by default.

`--report_graph` enables the program to create the CSV graph in GFA format, which is not activated by default.

`--report_mechanism` is used to infer the formation mechansim according to the breakpoint sequence features. This is still underdevelopment, which is not recommended to use for current version.

`--contig` is used for calling from assemblies, which currently uses minimap2 aligned BAM file as input.

#### Other parameters

`--partition_max_distsance` maximum distance allowed of a group of feature sequences.

`--cluster_max_distance` maximum distance for feature sequence clustering. This is implemented via Scipy hierarchical clustering.

`--k_size` size of kmer used in hash-table realignment, only used when `--hash_table` is activated.

`--min_accept` minimum matched segment length, default is 50bp.

## SVision output

### VCF

The SV `ID` column is given in the format of `a_b`, where `b` indicates site `a` contains other type of SVs.

Filters used in the output.

`Covered`: The entire SV is spanned by long-reads, producing the most confident calls.

`Uncovered`: SV is partially spanned by long-reads, i.e. reads spanning one of the breakpoints.

`Clustered`: SV is partially spanned by long-reads, but can be spanned through reads clusters.

We add extra attributes in the `INFO` column of VCF format for SVision detected structural variants.

`BRPKS`: The CNN recognized breakpoint junctions through tMOR.

`GraphID`: The graph index used to indicate the graph structure, which requires `--report_graph` and is obtained by calculating isomorphic graphs. The ID for simple SVs is -1.

`VAF`: The estimated variant allele fraction, which is calculated by DV/DR. Note that SVision does not calculate the genotypes in the current version.

### CSV graph

#### Graph format

The graph output requires `--report_graph` activated. The below example is an CSV in rGFA format, which is detected by SVision at chr11:99,819,283-99,820,576 in HG00733. The graph output is saved in separated files for each CSV events.

` S S1 SN:Z:chr11 SO:i:99819338 SR:i:0 LN:i:2990 S I0 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:15813 SR:i:0 LN:i:1113 S I1 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:16927 SR:i:0 LN:i:466 S I2 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:17400 SR:i:0 LN:i:377 DP:S:S1:99820198 S I3 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:17778 SR:i:0 LN:i:838 S I4 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:18617 SR:i:0 LN:i:61 DP:S:S0:99819276 L S0 + I0 + 0M SR:i:0 L I0 + I1 + 0M SR:i:0 L I1 + I2 - 0M SR:i:0 L I2 - I3 + 0M SR:i:0 L I3 + I4 + 0M SR:i:0 L I4 + S1 + 0M SR:i:0 `

Besides the information included in standard [rGFA](https://github.com/lh3/gfatools/blob/master/doc/rGFA.md) format, we add another `DP:S` column to indicate sequence with detected origins via local realignment, such as node `I2` is duplicated from node `S1`.

#### Graph alignment (Experimental)

Note: This function is not included in the current program, it is a post-processing step that tries to validate the detected CSVs.

To validate the detected CSV, we align raw HiFi reads to the mini graph (CSV graph) reported by SVision with GraphAligner.

Step1: Extract HiFi raw reads

` samtools view -b HG00733.ngmlr.sorted.bam chr11:99810000-99830000 > tmp.bam samtools fasta tmp.bam > tmp.fasta `

Step2: Align with GraphAligner

Please check [GraphAligner](https://github.com/maickrau/GraphAligner) for the detailed usage.

` GraphAligner -g chr11-99819283-99820576.gfa -f tmp.fasta -a aln.gaf -x vg `

Example of CSV path supporting reads

` m54329U_190827_173812/140708091/ccs     21668   0       21668   +       >S0>I0>I1<I2>I3>I4>S1 m54329U_190617_231905/88145984/ccs      13612   0       13612   +       >S0>I0>I1<I2>I3>I4>S1 m54329U_190617_231905/88145984/ccs      13612   0       13612   +       >S0>I0>I1<I2>I3>I4>S1 `

## Contact If you have any questions, please feel free to contact: jiadong66@stu.xjtu.edu.cn, songbowang125@163.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SVision-1.3.3.tar.gz (59.5 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page