Long read based human genomic structural variation detection with cuteSV
Project description
cuteSV
Getting Start
__________ __ __
| ____ | | | | |
_ | | |__| | | | |
_______ _ _ ___| |___ ______ | | | | | |
| ___ | | | | | |___ ___| / ____ \ | |_______ | | | |
| | |_| | | | | | | / /____\ \ |_______ | | | | |
| | | | | | | | | _______| __ | | \ \ / /
| | _ | | | | | | _ | | _ | | | | \ \ / /
| |___| | | |___| | | |_| | \ \____/ | | |____| | \ \_/ /
|_______| |_______| |_____| \______/ |__________| \_____/
Installation
$ pip install cuteSV
or
$ conda install -c bioconda cutesv
or
$ git clone https://github.com/tjiangHIT/cuteSV.git && cd cuteSV/ && pip install .
Introduction
Long-read sequencing enables the comprehensive discovery of structural variations (SVs). However, it is still non-trivial to achieve high sensitivity and performance simultaneously due to the complex SV signatures implied by the noisy long reads. Therefore, we propose cuteSV, a sensitive, fast and scalable long read-based SV detection approach. cuteSV uses tailored methods to collect the signatures of various types of SVs and it employs a clustering-and-refinement method to analyze the signatures to implement sensitive SV detection. Benchmark on real PacBio and ONT datasets demonstrate that cuteSV has better yields and scalability than state-of-the-art tools.
The benchmark results of cuteSV on the HG002 human sample are below:
BTW, we used Truvari to calculate the recall, precision, and f-measure.
Dependence
1. python3
2. pysam
3. Biopython
4. cigar
5. numpy
Usage
cuteSV <sorted.bam> <output.vcf> <work_dir>
Suggestions
> For PacBio CLR data:
--max_cluster_bias_INS 100
--diff_ratio_merging_INS 0.2
--diff_ratio_filtering_INS 0.6
--diff_ratio_filtering_DEL 0.7
> For PacBio CCS(HIFI) data:
--max_cluster_bias_INS 200
--diff_ratio_merging_INS 0.65
--diff_ratio_filtering_INS 0.65
--diff_ratio_filtering_DEL 0.35
Parameter | Description | Default |
---|---|---|
--threads | Number of threads to use. | 16 |
--batches | Batch of genome segmentation interval. | 10,000,000 |
--sample | Sample name/id | NULL |
--max_split_parts | Maximum number of split segments a read may be aligned before it is ignored. | 7 |
--min_mapq | Minimum mapping quality value of alignment to be taken into account. | 20 |
--min_read_len | Ignores reads that only report alignments with not longer then bp. | 500 |
--min_support | Minimum number of reads that support a SV to be reported. | 3 |
--min_length | Minimum length of SV to be reported. | 30 |
--max_cluster_bias_INS | Maximum distance to cluster read together for insertion. | 100 |
--diff_ratio_merging_INS | Do not merge breakpoints with basepair identity more than the ratio of default for insertion. | 0.2 |
--diff_ratio_filtering_INS | Filter breakpoints with basepair identity less than the ratio of default for insertion. | 0.6 |
--max_cluster_bias_DEL | Maximum distance to cluster read together for deletion. | 200 |
--diff_ratio_merging_DEL | Do not merge breakpoints with basepair identity more than the ratio of default for deletion. | 0.3 |
--diff_ratio_filtering_DEL | Filter breakpoints with basepair identity less than the ratio of default for deletion. | 0.7 |
--max_cluster_bias_INV | Maximum distance to cluster read together for inversion. | 500 |
--max_cluster_bias_DUP | Maximum distance to cluster read together for duplication. | 500 |
--max_cluster_bias_TRA | Maximum distance to cluster read together for translocation. | 50 |
--diff_ratio_filtering_TRA | Filter breakpoints with basepair identity less than the ratio of default for translocation. | 0.6 |
Datasets generated from cuteSV
We provided the SV callsets of the HG002 human sample produced by cuteSV form three different long-read sequencing platforms (i.e. PacBio CLR, PacBio CCS, and ONT PromethION).
Please cite the manuscript of cuteSV before using these callsets.
Changelog
cuteSV (v1.0.2):
1.Improve the genotyping performance and enable it to be default option.
2.Make the description of parameters better.
3.Modify the header description of vcf file.
4.Add two new indicators, i.e., BREAKPOINT_STD and SVLEN_STD, to further characterise deletion and insertion.
5.Remove a few redundant functions which will reduce code readability.
Citation
Long Read based Human Genomic Structural Variation Detection with cuteSV. Tao Jiang, et al. bioRxiv 780700; doi: https://doi.org/10.1101/780700
Contact
For advising, bug reporting and requiring help, please post on Github Issue or contact tjiang@hit.edu.cn.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cuteSV-1.0.2-beta-.tar.gz
.
File metadata
- Download URL: cuteSV-1.0.2-beta-.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45faa824e3a0334dbf71328c4c42954296f5685ff7d273d2d536a9574be72104 |
|
MD5 | 6f5bf37c25456a8ad9391f00bcaa65ad |
|
BLAKE2b-256 | 2b1ff35593122d7c8f6a4b7f654e8a09126c4444563ef1eadf7a45e16c56ef7c |