Skip to main content

Python library tailored for for manipulating and visualizing minimap2's cs tags

Project description

Licence Test Python PyPI Bioconda JOSS DOI

cstag

cstag is a Python library tailored for manipulating and visualizing minimap2's cs tags.

[!NOTE] To add cs tags to SAM/BAM files, check out cstag-cli.

🌟 Features

  • cstag.call(): Generate a cs tag
  • cstag.shorten(): Convert a cs tag from its long to short format
  • cstag.lengthen(): Convert a cs tag from its short to long format
  • cstag.consensus(): Create a consensus cs tag from multiple cs tags
  • cstag.mask(): Mask low-quality bases within a cs tag
  • cstag.split(): Break down a cs tag into its constituent parts
  • cstag.revcomp(): Convert a cs tag to its reverse complement
  • cstag.to_sequence(): Reconstruct a reference subsequence from the alignment
  • cstag.to_vcf(): Generate a VCF representation
  • cstag.to_html(): Generate an HTML representation

For comprehensive documentation, please visit our docs.

🛠 Installation

Using PyPI:

pip install cstag

Using Bioconda:

conda install -c bioconda cstag

💡 Usage

Generating cs tags

import cstag

cigar = "8M2D4M2I3N1M"
md = "2A5^AG7"
seq = "ACGTACGTACGTACG"

print(cstag.call(cigar, md, seq))
# :2*ag:5-ag:4+ac~nn3nn:1

print(cstag.call(cigar, md, seq, long=True))
# =AC*ag=TACGT-ag=ACGT+ac~nn3nn=G

Shortening or Lengthening cs tags

import cstag

# Convert a cs tag from long to short
cs_tag = "=ACGT*ag=CGT"

print(cstag.shorten(cs_tag))
# :4*ag:3


# Convert a cs tag from short to long
cs_tag = ":4*ag:3"
cigar = "8M"
seq = "ACGTACGT"

print(cstag.lengthen(cs_tag, cigar, seq))
# =ACGT*ag=CGT

Creating a Consensus

import cstag

cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=C*gt=T", "=ACT+ccc=T"]
positions = [1, 1, 2, 2, 1]

print(cstag.consensus(cs_tags, positions))
# =AC*gt=T

Masking Low-Quality Bases

import cstag

cs_tag = "=ACGT*ac+gg-cc=T"
cigar = "5M2I2D1M"
qual = "AA!!!!AA"
phred_threshold = 10
print(cstag.mask(cs_tag, cigar, qual, phred_threshold))
# =ACNN*an+ng-cc=T

Splitting a cs tag

import cstag

cs_tag = "=ACGT*ac+gg-cc=T"
print(cstag.split(cs_tag))
# ['=ACGT', '*ac', '+gg', '-cc', '=T']

Reverse Complement of a cs tag

import cstag

cs_tag = "=ACGT*ac+gg-cc=T"
print(cstag.revcomp(cs_tag))
# =A-gg+cc*tg=ACGT

Reconstructing the Reference Subsequence

import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
print(cstag.to_sequence(cs_tag))
# ACTTCTTA

Generating a VCF Report

import cstag
cs_tag = "=AC*gt=T-gg=C+tt=A"
chrom = "chr1"
pos = 1
print(cstag.to_vcf(cs_tag, chrom, pos))
"""
##fileformat=VCFv4.2
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr1	3	.	G	T	.	.	.
chr1	4	.	TGG	T	.	.	.
chr1	5	.	C	CTT	.	.	.
"""

The multiple cs tags enable reporting of the variant allele frequency (VAF).

import cstag
cs_tags = ["=ACGT", "=AC*gt=T", "=C*gt=T", "=ACGT", "=AC*gt=T"]
chroms = ["chr1", "chr1", "chr1", "chr2", "chr2"]
positions = [2, 2, 3, 10, 100]
print(cstag.to_vcf(cs_tags, chroms, positions))
"""
##fileformat=VCFv4.2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">
##INFO=<ID=AD,Number=1,Type=Integer,Description="Depth of Alt allele">
##INFO=<ID=VAF,Number=1,Type=Float,Description="Variant allele frequency (AD/DP)">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr1	4	.	G	T	.	.	DP=3;RD=1;AD=2;VAF=0.667
chr2	102	.	G	T	.	.	DP=1;RD=0;AD=1;VAF=1.0
"""

Generating an HTML Report

import cstag
from pathlib import Path

cs_tag = "=AC+ggg=T-acgt*at~gt10ag=GNNN"
description = "Example"

cs_tag_html = cstag.to_html(cs_tag, description)
Path("report.html").write_text(cs_tag_html)
# Output "report.html"

You can visualize mutations indicated by the cs tag using the generated report.html file as shown below:

image

📣 Feedback and Support

For questions, bug reports, or other forms of feedback, we'd love to hear from you!
Please use GitHub Issues for all reporting purposes.

Please refer to CONTRIBUTING for how to contribute and how to verify your contributions.

🤝 Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.
By participating in this project you agree to abide by its terms.

📄 Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cstag-1.1.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

cstag-1.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file cstag-1.1.0.tar.gz.

File metadata

  • Download URL: cstag-1.1.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for cstag-1.1.0.tar.gz
Algorithm Hash digest
SHA256 cecde4b604c00114fadc2344f9fd9b19947ce56f04cabe7b184e621c9c6e3355
MD5 1b6047da602e6fbc3489339ec3bbdbe8
BLAKE2b-256 aefe4d65a1247297fd816b7590157a3fe4c0640bd00a4faac4d3dea53eb71b8f

See more details on using hashes here.

File details

Details for the file cstag-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: cstag-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for cstag-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d1d1a70850ad24d2aa8ae952f18e9a9b0b28b84ae75acddfaaeaa40dd84aa7f
MD5 e06672ef6423a20079890809493902c2
BLAKE2b-256 55f2b48c827513e6f75cff3d824072ff418543b863986f013208b9e536351828

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page