Python library tailored for for manipulating and visualizing minimap2's cs tags
Project description
cstag
is a Python library tailored for manipulating and visualizing minimap2's cs tags.
[!NOTE] To add cs tags to SAM/BAM files, check out
cstag-cli
.
🌟 Features
cstag.call()
: Generate a cs tagcstag.shorten()
: Convert a cs tag from its long to short formatcstag.lengthen()
: Convert a cs tag from its short to long formatcstag.consensus()
: Create a consensus cs tag from multiple cs tagscstag.mask()
: Mask low-quality bases within a cs tagcstag.split()
: Break down a cs tag into its constituent partscstag.revcomp()
: Convert a cs tag to its reverse complementcstag.to_sequence()
: Reconstruct a reference subsequence from the alignmentcstag.to_vcf()
: Generate a VCF representationcstag.to_html()
: Generate an HTML representation
For comprehensive documentation, please visit our docs.
🛠 Installation
Using PyPI:
pip install cstag
Using Bioconda:
conda install -c bioconda cstag
💡 Usage
Generating cs tags
import cstag
cigar = "8M2D4M2I3N1M"
md = "2A5^AG7"
seq = "ACGTACGTACGTACG"
print(cstag.call(cigar, md, seq))
# :2*ag:5-ag:4+ac~nn3nn:1
print(cstag.call(cigar, md, seq, long=True))
# =AC*ag=TACGT-ag=ACGT+ac~nn3nn=G
Shortening or Lengthening cs tags
import cstag
# Convert a cs tag from long to short
cs_tag = "=ACGT*ag=CGT"
print(cstag.shorten(cs_tag))
# :4*ag:3
# Convert a cs tag from short to long
cs_tag = ":4*ag:3"
cigar = "8M"
seq = "ACGTACGT"
print(cstag.lengthen(cs_tag, cigar, seq))
# =ACGT*ag=CGT
Creating a Consensus
import cstag
cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=C*gt=T", "=ACT+ccc=T"]
positions = [1, 1, 2, 2, 1]
print(cstag.consensus(cs_tags, positions))
# =AC*gt=T
Masking Low-Quality Bases
import cstag
cs_tag = "=ACGT*ac+gg-cc=T"
cigar = "5M2I2D1M"
qual = "AA!!!!AA"
phred_threshold = 10
print(cstag.mask(cs_tag, cigar, qual, phred_threshold))
# =ACNN*an+ng-cc=T
Splitting a cs tag
import cstag
cs_tag = "=ACGT*ac+gg-cc=T"
print(cstag.split(cs_tag))
# ['=ACGT', '*ac', '+gg', '-cc', '=T']
Reverse Complement of a cs tag
import cstag
cs_tag = "=ACGT*ac+gg-cc=T"
print(cstag.revcomp(cs_tag))
# =A-gg+cc*tg=ACGT
Reconstructing the Reference Subsequence
import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
print(cstag.to_sequence(cs_tag))
# ACTTCTTA
Generating a VCF Report
import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
chrom = "chr1"
pos = 1
print(cstag.to_vcf(cs_tag, chrom, pos))
"""
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 3 . G T . . .
chr1 4 . TGG T . . .
chr1 5 . C CTT . . .
"""
The multiple cs tags enable reporting of the variant allele frequency (VAF).
import cstag
cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=ACGT", "=AC*gt=T"]
chroms = ["chr1", "chr1", "chr1", "chr2", "chr2"]
positions = [2, 2, 3, 10, 100]
print(cstag.to_vcf(cs_tags, chroms, positions))
"""
##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">
##INFO=<ID=AD,Number=1,Type=Integer,Description="Depth of Alt allele">
##INFO=<ID=VAF,Number=1,Type=Float,Description="Variant allele frequency (AD/DP)">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 4 . G T . . DP=3;RD=1;AD=2;VAF=0.667
chr2 102 . G T . . DP=1;RD=0;AD=1;VAF=1.0
"""
Generating an HTML Report
import cstag
from pathlib import Path
cs_tag = "=AC+ggg=T-acgt*at~gt10ag=GNNN"
description = "Example"
cs_tag_html = cstag.to_html(cs_tag, description)
Path("report.html").write_text(cs_tag_html)
# Output "report.html"
You can visualize mutations indicated by the cs tag using the generated report.html
file as shown below:
📣 Feedback and Support
For questions, bug reports, or other forms of feedback, we'd love to hear from you!
Please use GitHub Issues for all reporting purposes.
Please refer to CONTRIBUTING for how to contribute and how to verify your contributions.
🤝 Code of Conduct
Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.
📄 Citation
- Kuno, A., (2024). cstag and cstag-cli: tools for manipulating and visualizing cs tags. Journal of Open Source Software, 9(93), 6066, https://doi.org/10.21105/joss.06066