Python module to manipulate the minimap2's CS tag
Project description
cstag
cstag
is a Python library tailored for the manipulation and handling of minimap2's CS tags.
🌟 Features
cstag.call()
: Generate a CS tagcstag.shorten()
: Convert a CS tag from its long to short formatcstag.lengthen()
: Convert a CS tag from its short to long formatcstag.consensus()
: Create a consensus CS tag from multiple CS tagscstag.mask()
: Mask low-quality bases within a CS tagcstag.split()
: Break down a CS tag into its constituent partscstag.revcomp()
: Convert a CS tag to its reverse complementcstag.to_sequence()
: Reconstruct a reference subsequence from the alignmentcstag.to_vcf()
: Generate a VCF filecstag.to_html()
: Produce an HTML representation
For comprehensive documentation, please visit our docs.
To add CS tags to SAM/BAM files, check out cstag-cli
.
🛠 Installation
Using PyPI:
pip install cstag
Using Bioconda:
conda install -c bioconda cstag
💡 Usage
Generating CS Tags
import cstag
cigar = "8M2D4M2I3N1M"
md = "2A5^AG7"
seq = "ACGTACGTACGTACG"
print(cstag.call(cigar, md, seq))
# :2*ag:5-ag:4+ac~nn3nn:1
cstag.call(cigar, md, seq, long=True)
# =AC*ag=TACGT-ag=ACGT+ac~nn3nn=G
Shortening or Lengthening CS Tags
import cstag
# Convert a CS tag from long to short
cs_tag = "=ACGT*ag=CGT"
cstag.shorten(cs_tag)
# :4*ag:3
# Convert a CS tag from short to long
cs_tag = ":4*ag:3"
cigar = "8M"
seq = "ACGTACGT"
cstag.lengthen(cs_tag, cigar, seq)
# =ACGT*ag=CGT
Creating a Consensus
import cstag
cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=C*gt=T", "=ACT+ccc=T"]
positions = [1, 1, 2, 2, 1]
cstag.consensus(cs_tags, positions)
# =AC*gt*T
Masking Low-Quality Bases
import cstag
cs_tag = "=ACGT*ac+gg-cc=T"
cigar = "5M2I2D1M"
qual = "AA!!!!AA"
phred_threshold = 10
cstag.mask(cs_tag, cigar, qual, phred_threshold)
# =ACNN*an+ng-cc=T
Splitting a CS Tag
import cstag
cs_tag = "=ACGT*ac+gg-cc=T"
cstag.split(cs_tag)
# ['=ACGT', '*ac', '+gg', '-cc', '=T']
Reverse Complement of a CS Tag
import cstag
cs_tag = "=ACGT*ac+gg-cc=T"
cstag.revcomp(cs_tag)
# =A-gg+cc*tg=ACGT
Reconstructing the Reference Subsequence
import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
cstag.to_sequence(cs_tag)
# ACTTCTTA
Generating a VCF Report
import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
chrom = "chr1"
pos = 1
print(cstag.to_vcf(cs_tag, chrom, pos))
"""
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 3 . G T . . .
chr1 4 . TGG T . . .
chr1 5 . C CTT . . .
"""
The multiple CS tags enable reporting of the variant allele frequency (VAF).
import cstag
cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=ACGT", "=AC*gt=T"]
chroms = ["chr1", "chr1", "chr1", "chr2", "chr2"]
positions = [2, 2, 3, 10, 100]
print(cstag.to_vcf(cs_tags, chroms, positions))
"""
##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">
##INFO=<ID=AD,Number=1,Type=Integer,Description="Depth of Alt allele">
##INFO=<ID=VAF,Number=1,Type=Float,Description="Variant allele frequency (AD/DP)">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 4 . G T . . DP=3;RD=1;AD=2;VAF=0.667
chr2 102 . G T . . DP=1;RD=0;AD=1;VAF=1.0
"""
Generating an HTML Report
import cstag
from pathlib import Path
cs_tag = "=AC+ggg=T-acgt*at~gt10ag=GNNN"
description = "Example"
cs_tag_html = cstag.to_html(cs_tag, description)
Path("report.html").write_text(cs_tag_html)
# Output "report.html"
You can visualize mutations indicated by the CS tag using the generated report.html
file as shown below:
📣 Feedback and Support
For questions, bug reports, or other forms of feedback, we'd love to hear from you!
Please use GitHub Issues for all reporting purposes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cstag-0.6.2.tar.gz
(14.6 kB
view hashes)
Built Distribution
cstag-0.6.2-py3-none-any.whl
(17.5 kB
view hashes)