Skip to main content

Scan archaeal DNA sequences for UAG amber codons and classify them as pyrrolysine-coding or stop

Project description

amber-codon-scanner

CI Python 3.9+ License: MIT

A Python library for scanning archaeal DNA sequences for UAG (amber) codons and classifying each as a likely pyrrolysine-coding codon or a true translation stop.

Built to support research from the Nayak lab:

Shalvarjian, Chadwick et al. (2025). Methanogenic archaea encoding pyrrolysine maintain ambiguous amber codon usage. PNAS 122(45):e2517473122.


Background

Pyrrolysine (Pyl) is the 22nd genetically encoded amino acid, incorporated at UAG codons in certain methanogenic archaea including Methanosarcina acetivorans, M. mazei, and M. barkeri. These organisms use UAG ambiguously — the same codon serves as both a stop signal and a pyrrolysine codon depending on context.

This tool uses two evidence sources to classify each UAG:

Evidence Description Source
Mid-ORF context UAG flanked by in-frame coding sequence with a downstream in-frame stop Heuristic
PYLIS element GC-rich stem-loop downstream that promotes Pyl-tRNA read-through Heuristic pattern (see PYLIS section below)

Important: Classification is heuristic only. No validated sensitivity/specificity data exist for this classifier across all archaeal lineages. For definitive results, check for the pylTSBCD gene cluster in the genome (see below) and/or use experimental data.


Installation

# From source
git clone https://github.com/CameronPiepkorn/amber-codon-scanner
cd amber-codon-scanner
pip install -e ".[dev]"

Quick Start

from amber_codon_scanner import AmberCodonScanner, Classification, report

scanner = AmberCodonScanner(
    min_orf_length=10,          # codons upstream required to call mid-ORF
    downstream_stop_window=50,  # codons downstream to search for in-frame stop
    check_pylis=True,           # search for PYLIS-like element
)

codons = scanner.scan_sequence(my_dna_string, seq_id="MA0859_mttB")
print(report.summary(codons))

# Export
report.to_tsv(codons, "results.tsv")
report.to_json(codons, "results.json")

# Filter to candidates only
candidates = [c for c in codons if c.classification == Classification.PYL_CANDIDATE]

Scan a whole FASTA file

results = scanner.scan_fasta("my_genome.fasta")
for seq_id, codons in results.items():
    print(f"\n=== {seq_id} ===")
    print(report.summary(codons))

Classification Rules

mid-ORF PYLIS detected Classification
pyrrolysine_candidate
ambiguous
ambiguous
stop_likely

Confirming Results — pylTSBCD Gene Cluster

The most reliable confirmation of pyrrolysine encoding is the presence of the pyl biosynthesis gene cluster in the same genome:

Gene Function
pylS Pyrrolysyl-tRNA synthetase (charges tRNA with pyrrolysine)
pylT Pyrrolysine tRNA (anticodon CUA, decodes UAG)
pylB/C/D Pyrrolysine biosynthesis enzymes

To check for these genes using HMMER against your genome:

# Download pyl HMM profiles from Pfam / TIGRFAM
# Then:
hmmsearch --tblout pyl_hits.txt Pyl_synthetase.hmm my_genome.faa

Known pyrrolysine-encoding genes in M. acetivorans C2A (GenBank AE010299.1):

Locus Gene Contains UAG
MA0859 mttB Yes (trimethylamine methyltransferase)
MA4384 mtmB Yes (monomethylamine methyltransferase)

For the full list of pyrrolysine-encoding methyltransferases and their locus tags, query GenBank AE010299.1 directly or see the NCBI gene pages linked in the example FASTA file header.


PYLIS Element Detection

The PYLIS (Pyrrolysine Insertion Sequence) element is a GC-rich stem-loop found downstream of UAG codons in pyrrolysine-encoding genes. It promotes read-through of UAG by the pyrrolysyl-tRNA.

This tool uses a relaxed heuristic pattern (GC-rich stem ≥5 bp + loop 4-8 nt + GC-rich continuation, within an 80 nt window downstream of UAG) to flag PYLIS-like structures. The detection parameters are heuristic choices, not calibrated from a published benchmark. This is intentionally permissive and will produce false positives in GC-rich genomes. Results are labelled pylis_detected: True only, not "PYLIS confirmed".


Repository Structure

amber-codon-scanner/
├── amber_codon_scanner/
│   ├── __init__.py
│   ├── scanner.py      ← AmberCodonScanner, AmberCodon, Classification
│   ├── report.py       ← TSV / JSON / text export
│   └── utils.py        ← FASTA parser, codon table, reverse complement
├── tests/
│   └── test_scanner.py
├── examples/
│   ├── example_sequences.fasta   ← synthetic placeholders + NCBI links
│   └── basic_usage.py
├── .github/workflows/ci.yml
├── pyproject.toml
└── README.md

Running Tests

pytest
pytest --cov=amber_codon_scanner

Citation

If you use this tool in published research, please cite:

@article{shalvarjian2025,
  title   = {Methanogenic archaea encoding pyrrolysine maintain
             ambiguous amber codon usage},
  author  = {Shalvarjian, Katherine E and Chadwick, Garrett L and
             P{\'e}rez, Paloma I and Woods, Patrick H and Orphan, Victoria J
             and Nayak, Dipti D},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {122},
  number  = {45},
  pages   = {e2517473122},
  year    = {2025}
}

License

MIT © 2024. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amber_codon_scanner-0.1.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amber_codon_scanner-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file amber_codon_scanner-0.1.0.tar.gz.

File metadata

  • Download URL: amber_codon_scanner-0.1.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amber_codon_scanner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4361dd057375f5bcfbcab9cceb9fb8c095122885ea76c8edc55887aafff48c83
MD5 1a85e11b965edc2c9b0369b4f3cf56b8
BLAKE2b-256 bd2cd9a684bf9db5e6075d1514ef3973dda69e5b77a28ac7584c0e1930cbec8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for amber_codon_scanner-0.1.0.tar.gz:

Publisher: ci.yml on CameronPiepkorn/amber-codon-scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file amber_codon_scanner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for amber_codon_scanner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c4c39d67a69b9038a579a8d1352b549e332c45579c0284ba7f665ccbd76f01f
MD5 086ffb01a07b6b0aee12bd6741affa09
BLAKE2b-256 b49c406423d1f3b24c32818b9d892799b27cc2a247ea3fe48386fdbcd9f277ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for amber_codon_scanner-0.1.0-py3-none-any.whl:

Publisher: ci.yml on CameronPiepkorn/amber-codon-scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page