Long read based human genomic structural variation detection with cuteSV

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

cuteSV

Getting Start

                                               __________    __       __
                                              |   ____   |  |  |     |  |
                          _                   |  |    |__|  |  |     |  |
 _______    _     _   ___| |___     ______    |  |          |  |     |  |
|  ___  |  | |   | | |___   ___|   / ____ \   |  |_______   |  |     |  |
| |   |_|  | |   | |     | |      / /____\ \  |_______   |  |  |     |  |
| |        | |   | |     | |      | _______|   __     |  |  \  \     /  /
| |    _   | |   | |     | |  _   | |     _   |  |    |  |   \  \   /  /
| |___| |  | |___| |     | |_| |  \ \____/ |  |  |____|  |    \  \_/  /
|_______|  |_______|     |_____|   \______/   |__________|     \_____/

Installation

$ pip install cuteSV
or
$ conda install -c bioconda cutesv
or 
$ git clone https://github.com/tjiangHIT/cuteSV.git && cd cuteSV/ && pip install .

Introduction

Long-read sequencing enables the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high sensitivity and performance simultaneously due to the complex SV signatures implied by the noisy long reads. Therefore, we propose cuteSV, a sensitive, fast and scalable long read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and it employs a clustering-and-refinement method to analyze the signatures to implement sensitive SV detection. Benchmark on real PacBio and ONT datasets demonstrate that cuteSV has better yields and scalability than state-of-the-art tools.

The benchmark results of cuteSV on the HG002 human sample are below:

BTW, we used Truvari to calculate the recall, precision, and f-measure. For more detailed implementation of SV benchmarks, we show an example here.

Dependence

1. python3
2. pysam
3. Biopython
4. cigar
5. numpy

Usage

cuteSV <sorted.bam> <output.vcf> <work_dir>

Suggestions

> For PacBio CLR data:
	--max_cluster_bias_INS		100
	--diff_ratio_merging_INS	0.2
	--diff_ratio_filtering_INS	0.6
	--diff_ratio_filtering_DEL	0.7
> For PacBio CCS(HIFI) data:
	--max_cluster_bias_INS		200
	--diff_ratio_merging_INS	0.65
	--diff_ratio_filtering_INS	0.65
	--diff_ratio_filtering_DEL	0.35

Parameter	Description	Default
--threads	Number of threads to use.	16
--batches	Batch of genome segmentation interval.	10,000,000
--sample	Sample name/id	NULL
--max_split_parts	Maximum number of split segments a read may be aligned before it is ignored.	7
--min_mapq	Minimum mapping quality value of alignment to be taken into account.	20
--min_read_len	Ignores reads that only report alignments with not longer then bp.	500
--min_support	Minimum number of reads that support a SV to be reported.	10
--min_length	Minimum length of SV to be reported.	30
--max_cluster_bias_INS	Maximum distance to cluster read together for insertion.	100
--diff_ratio_merging_INS	Do not merge breakpoints with basepair identity more than the ratio of default for insertion.	0.2
--diff_ratio_filtering_INS	Filter breakpoints with basepair identity less than the ratio of default for insertion.	0.6
--max_cluster_bias_DEL	Maximum distance to cluster read together for deletion.	200
--diff_ratio_merging_DEL	Do not merge breakpoints with basepair identity more than the ratio of default for deletion.	0.3
--diff_ratio_filtering_DEL	Filter breakpoints with basepair identity less than the ratio of default for deletion.	0.7
--max_cluster_bias_INV	Maximum distance to cluster read together for inversion.	500
--max_cluster_bias_DUP	Maximum distance to cluster read together for duplication.	500
--max_cluster_bias_TRA	Maximum distance to cluster read together for translocation.	50
--diff_ratio_filtering_TRA	Filter breakpoints with basepair identity less than the ratio of default for translocation.	0.6

Datasets generated from cuteSV

We provided the SV callsets of the HG002 human sample produced by cuteSV form three different long-read sequencing platforms (i.e. PacBio CLR, PacBio CCS, and ONT PromethION).

You can download them at:

Please cite the manuscript of cuteSV before using these callsets.

Changelog

cuteSV (v1.0.3):
1.Refine the genotyping model.
2.Adjust the threshold value of heterozygosis alleles.

cuteSV (v1.0.2):
1.Improve the genotyping performance and enable it to be default option.
2.Make the description of parameters better.
3.Modify the header description of vcf file.
4.Add two new indicators, i.e., BREAKPOINT_STD and SVLEN_STD, to further characterise deletion and insertion.
5.Remove a few redundant functions which will reduce code readability.

Citation

Long Read based Human Genomic Structural Variation Detection with cuteSV. Tao Jiang, et al. bioRxiv 780700; doi: https://doi.org/10.1101/780700

Contact

For advising, bug reporting and requiring help, please post on Github Issue or contact tjiang@hit.edu.cn.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.1.1

Apr 12, 2024

2.1.0

Nov 17, 2023

2.0.3

May 12, 2023

2.0.2

Nov 3, 2022

2.0.1

Sep 12, 2022

2.0.0

Aug 30, 2022

1.0.13

Jan 24, 2022

1.0.12

Oct 4, 2021

1.0.11

May 16, 2021

1.0.10

Dec 31, 2020

1.0.9

Nov 2, 2020

1.0.7

Aug 12, 2020

1.0.6

May 2, 2020

1.0.6a0 pre-release

Aug 12, 2020

1.0.5

Apr 17, 2020

1.0.4

Dec 10, 2019

This version

1.0.3

Nov 28, 2019

1.0.2.1

Nov 15, 2019

1.0.2-beta- pre-release

Nov 15, 2019

1.0.1

Sep 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuteSV-1.0.3.tar.gz (25.5 kB view hashes)

Uploaded Nov 28, 2019 Source

Built Distribution

cuteSV-1.0.3-py3-none-any.whl (30.8 kB view hashes)

Uploaded Nov 28, 2019 Python 3

Hashes for cuteSV-1.0.3.tar.gz

Hashes for cuteSV-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`b4b3bcf5621212780788312c170284ec23b07b381bacfc119f904f531fb5c3a8`
MD5	`263bfffbea7ab6e2a50e643bf842e3c1`
BLAKE2b-256	`fafaffd3c853420cb18932a9b3a6a48b70846cdccaa3740975771e56f35da5c7`

Hashes for cuteSV-1.0.3-py3-none-any.whl

Hashes for cuteSV-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64752a8fdcba4ff4227a4967c470d6ed9aeb77fef574a44e0469478b07d09624`
MD5	`e81db40f3198ae1ecbf0ed11cd33d0bf`
BLAKE2b-256	`27480cd2e720778ea15ffe844a830215d43a75fa9b73b821db38e897cf4bf78b`