Regenotyping structural variants through an accurate and efficient force-calling method

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

cuteFC

Getting Start

                                               __________    ___________      
                                              |   ____   |  |   _____  |   
                          _                   |  |    |__|  | /      | |
 _______    _     _   ___| |___     ______    |  |          | |      |_|   
|  ___  |  | |   | | |___   ___|   / ____ \   |  |_______   | |    
| |   |_|  | |   | |     | |      / /____\ \  |   _______|  | | 
| |        | |   | |     | |      | _______|  |  |          | |       _
| |    _   | |   | |     | |  _   | |     _   |  |          | |      | |
| |___| |  | |___| |     | |_| |  \ \____/ |  |  |          | \______| |
|_______|  |_______|     |_____|   \______/   |__|          |__________|

Installation

$ git clone https://github.com/tjiangHIT/cuteFC.git && cd cuteFC/ && python setup.py install

Introduction

Accurate genotype assignment for SVs remains challenging, especially in large-scale joint calling. We develop cuteFC to achieve accurate and efficient regenotyping of SVs through a force-calling approach. Benchmarking results demonstrated that cuteFC outperforms state-of-the-art methods with 2%~5% higher F1 scores. SV joint-calling within the cohort revealed that cuteFC constructs the higher-quality genomic atlas with minimal computational resources. These results prove cuteFC to be a scalable and robust approach suitable for clinical applications, population studies, and related fields.

For more detailed implementation of SV benchmarks, we show an example here.

BTW, the whole functions in cuteFC have been integrated into our previous SV detector, cuteSV

Dependence

1. python3
2. pysam
3. Biopython
4. cigar
5. numpy
6. pyvcf

Usage

cuteFC <sorted.bam> <reference.fa> <output.vcf> <work_dir> -Ivcf <target.vcf>

Suggestions

> For PacBio CLR data:
	--max_cluster_bias_INS		500
	--diff_ratio_merging_INS	0.5
	--max_cluster_bias_DEL	1000
	--diff_ratio_merging_DEL	0.5

> For PacBio CCS(HIFI) data:
	--max_cluster_bias_INS		1000
	--diff_ratio_merging_INS	0.9
	--max_cluster_bias_DEL	1000
	--diff_ratio_merging_DEL	0.5

> For ONT data:
	--max_cluster_bias_INS		1000
	--diff_ratio_merging_INS	0.5
	--max_cluster_bias_DEL	1000
	--diff_ratio_merging_DEL	0.5

Parameter	Description	Default
--threads	Number of threads to use.	16
--batches	Batch of genome segmentation interval.	10,000,000
--sample	Sample name/id	NULL
--retain_work_dir	Enable to retain temporary folder and files.	False
--write_old_sigs	Enable to output temporary sig files.	False
--report_readid	Enable to report supporting read ids for each SV.	False
--max_split_parts	Maximum number of split segments a read may be aligned before it is ignored. All split segments are considered when using -1. (Recommand -1 when applying assembly-based alignment.)	7
--min_mapq	Minimum mapping quality value of alignment to be taken into account.	10
--min_read_len	Ignores reads that only report alignments with not longer than bp.	500
--merge_del_threshold	Maximum distance of deletion signals to be merged.	0
--merge_ins_threshold	Maximum distance of insertion signals to be merged.	100
--min_support	Minimum number of reads that support a SV to be reported.	10
--min_size	Minimum length of SV to be reported.	30
--max_size	Maximum size of SV to be reported. Full length SVs are reported when using -1.	100000
--genotype	Enable to generate genotypes.	False
--gt_round	Maximum round of iteration for alignments searching if perform genotyping.	500
--read_range	The interval range for counting reads distribution.	1000
-Ivcf	Optional given vcf file. Enable to perform force calling.	NULL
--max_cluster_bias_INS	Maximum distance to cluster read together for insertion.	100
--diff_ratio_merging_INS	Do not merge breakpoints with basepair identity more than the ratio of default for insertion.	0.3
--max_cluster_bias_DEL	Maximum distance to cluster read together for deletion.	200
--diff_ratio_merging_DEL	Do not merge breakpoints with basepair identity more than the ratio of default for deletion.	0.5
--max_cluster_bias_INV	Maximum distance to cluster read together for inversion.	500
--max_cluster_bias_DUP	Maximum distance to cluster read together for duplication.	500
--max_cluster_bias_TRA	Maximum distance to cluster read together for translocation.	50
--diff_ratio_filtering_TRA	Filter breakpoints with basepair identity less than the ratio of default for translocation.	0.6
--remain_reads_ratio	The ratio of reads remained in cluster to generate the breakpoint. Set lower to get more precise breakpoint when the alignment data have high quality but recommand over 0.5.	1
-include_bed	Optional given bed file. Only detect SVs in regions in the BED file.	NULL

Citation

Cao S et al. Re-genotyping structural variants through an accurate force-calling method. bioRxiv 2022.08.29.505534; doi: https://doi.org/10.1101/2022.08.29.505534

Contact

For advising, bug reporting, and requiring help, please post on Github Issue or contact tjiang@hit.edu.cn.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.0

Feb 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuteFC-1.0.0.tar.gz (46.1 kB view hashes)

Uploaded Feb 28, 2024 Source

Built Distribution

cuteFC-1.0.0-py3-none-any.whl (57.2 kB view hashes)

Uploaded Feb 28, 2024 Python 3

Hashes for cuteFC-1.0.0.tar.gz

Hashes for cuteFC-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c8cfc36d2b2195760d8feee2e3a8e79a885622c3e50f8ceb903d945d4e41e0ef`
MD5	`19578f60fdc12304d126a99bdb32b139`
BLAKE2b-256	`b6779cfabe14e1f088512cc73413529e097410ea9d3371540c2b8ad5dc8d6749`

Hashes for cuteFC-1.0.0-py3-none-any.whl

Hashes for cuteFC-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`814ba2cc7d806d42a644fc2f527bf6eb4737e79b1ef7a5ee0efffce6819f7eaf`
MD5	`9fc2d690501853908ae26d81075e706f`
BLAKE2b-256	`3c5bcd88fb3c1fd418d51ec898b5fcdba577e3e67111c8d79c5198c371db1ced`