Scan archaeal DNA sequences for UAG amber codons and classify them as pyrrolysine-coding or stop
Project description
amber-codon-scanner
A Python library for scanning archaeal DNA sequences for UAG (amber) codons and classifying each as a likely pyrrolysine-coding codon or a true translation stop.
Built to support research from the Nayak lab:
Shalvarjian, Chadwick et al. (2025). Methanogenic archaea encoding pyrrolysine maintain ambiguous amber codon usage. PNAS 122(45):e2517473122.
Background
Pyrrolysine (Pyl) is the 22nd genetically encoded amino acid, incorporated at UAG codons in certain methanogenic archaea including Methanosarcina acetivorans, M. mazei, and M. barkeri. These organisms use UAG ambiguously — the same codon serves as both a stop signal and a pyrrolysine codon depending on context.
This tool uses two evidence sources to classify each UAG:
| Evidence | Description | Source |
|---|---|---|
| Mid-ORF context | UAG flanked by in-frame coding sequence with a downstream in-frame stop | Heuristic |
| PYLIS element | GC-rich stem-loop downstream that promotes Pyl-tRNA read-through | Heuristic pattern (see PYLIS section below) |
⚠ Important: Classification is heuristic only. No validated sensitivity/specificity data exist for this classifier across all archaeal lineages. For definitive results, check for the pylTSBCD gene cluster in the genome (see below) and/or use experimental data.
Installation
# From source
git clone https://github.com/CameronPiepkorn/amber-codon-scanner
cd amber-codon-scanner
pip install -e ".[dev]"
Quick Start
from amber_codon_scanner import AmberCodonScanner, Classification, report
scanner = AmberCodonScanner(
min_orf_length=10, # codons upstream required to call mid-ORF
downstream_stop_window=50, # codons downstream to search for in-frame stop
check_pylis=True, # search for PYLIS-like element
)
codons = scanner.scan_sequence(my_dna_string, seq_id="MA0859_mttB")
print(report.summary(codons))
# Export
report.to_tsv(codons, "results.tsv")
report.to_json(codons, "results.json")
# Filter to candidates only
candidates = [c for c in codons if c.classification == Classification.PYL_CANDIDATE]
Scan a whole FASTA file
results = scanner.scan_fasta("my_genome.fasta")
for seq_id, codons in results.items():
print(f"\n=== {seq_id} ===")
print(report.summary(codons))
Classification Rules
| mid-ORF | PYLIS detected | Classification |
|---|---|---|
| ✅ | ✅ | pyrrolysine_candidate |
| ✅ | ❌ | ambiguous |
| ❌ | ✅ | ambiguous |
| ❌ | ❌ | stop_likely |
Confirming Results — pylTSBCD Gene Cluster
The most reliable confirmation of pyrrolysine encoding is the presence of the pyl biosynthesis gene cluster in the same genome:
| Gene | Function |
|---|---|
pylS |
Pyrrolysyl-tRNA synthetase (charges tRNA with pyrrolysine) |
pylT |
Pyrrolysine tRNA (anticodon CUA, decodes UAG) |
pylB/C/D |
Pyrrolysine biosynthesis enzymes |
To check for these genes using HMMER against your genome:
# Download pyl HMM profiles from Pfam / TIGRFAM
# Then:
hmmsearch --tblout pyl_hits.txt Pyl_synthetase.hmm my_genome.faa
Known pyrrolysine-encoding genes in M. acetivorans C2A (GenBank AE010299.1):
| Locus | Gene | Contains UAG |
|---|---|---|
| MA0859 | mttB | Yes (trimethylamine methyltransferase) |
| MA4384 | mtmB | Yes (monomethylamine methyltransferase) |
For the full list of pyrrolysine-encoding methyltransferases and their locus tags, query GenBank AE010299.1 directly or see the NCBI gene pages linked in the example FASTA file header.
PYLIS Element Detection
The PYLIS (Pyrrolysine Insertion Sequence) element is a GC-rich stem-loop found downstream of UAG codons in pyrrolysine-encoding genes. It promotes read-through of UAG by the pyrrolysyl-tRNA.
This tool uses a relaxed heuristic pattern (GC-rich stem ≥5 bp + loop
4-8 nt + GC-rich continuation, within an 80 nt window downstream of UAG)
to flag PYLIS-like structures. The detection parameters are heuristic
choices, not calibrated from a published benchmark. This is intentionally
permissive and will produce false positives in GC-rich genomes.
Results are labelled pylis_detected: True only, not "PYLIS confirmed".
Repository Structure
amber-codon-scanner/
├── amber_codon_scanner/
│ ├── __init__.py
│ ├── scanner.py ← AmberCodonScanner, AmberCodon, Classification
│ ├── report.py ← TSV / JSON / text export
│ └── utils.py ← FASTA parser, codon table, reverse complement
├── tests/
│ └── test_scanner.py
├── examples/
│ ├── example_sequences.fasta ← synthetic placeholders + NCBI links
│ └── basic_usage.py
├── .github/workflows/ci.yml
├── pyproject.toml
└── README.md
Running Tests
pytest
pytest --cov=amber_codon_scanner
Citation
If you use this tool in published research, please cite:
@article{shalvarjian2025,
title = {Methanogenic archaea encoding pyrrolysine maintain
ambiguous amber codon usage},
author = {Shalvarjian, Katherine E and Chadwick, Garrett L and
P{\'e}rez, Paloma I and Woods, Patrick H and Orphan, Victoria J
and Nayak, Dipti D},
journal = {Proceedings of the National Academy of Sciences},
volume = {122},
number = {45},
pages = {e2517473122},
year = {2025}
}
License
MIT © 2024. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amber_codon_scanner-0.1.0.tar.gz.
File metadata
- Download URL: amber_codon_scanner-0.1.0.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4361dd057375f5bcfbcab9cceb9fb8c095122885ea76c8edc55887aafff48c83
|
|
| MD5 |
1a85e11b965edc2c9b0369b4f3cf56b8
|
|
| BLAKE2b-256 |
bd2cd9a684bf9db5e6075d1514ef3973dda69e5b77a28ac7584c0e1930cbec8f
|
Provenance
The following attestation bundles were made for amber_codon_scanner-0.1.0.tar.gz:
Publisher:
ci.yml on CameronPiepkorn/amber-codon-scanner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
amber_codon_scanner-0.1.0.tar.gz -
Subject digest:
4361dd057375f5bcfbcab9cceb9fb8c095122885ea76c8edc55887aafff48c83 - Sigstore transparency entry: 1337010263
- Sigstore integration time:
-
Permalink:
CameronPiepkorn/amber-codon-scanner@acac9ea6b4d832362e9b4d038196da5020036d14 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CameronPiepkorn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@acac9ea6b4d832362e9b4d038196da5020036d14 -
Trigger Event:
push
-
Statement type:
File details
Details for the file amber_codon_scanner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: amber_codon_scanner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c4c39d67a69b9038a579a8d1352b549e332c45579c0284ba7f665ccbd76f01f
|
|
| MD5 |
086ffb01a07b6b0aee12bd6741affa09
|
|
| BLAKE2b-256 |
b49c406423d1f3b24c32818b9d892799b27cc2a247ea3fe48386fdbcd9f277ae
|
Provenance
The following attestation bundles were made for amber_codon_scanner-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on CameronPiepkorn/amber-codon-scanner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
amber_codon_scanner-0.1.0-py3-none-any.whl -
Subject digest:
9c4c39d67a69b9038a579a8d1352b549e332c45579c0284ba7f665ccbd76f01f - Sigstore transparency entry: 1337010384
- Sigstore integration time:
-
Permalink:
CameronPiepkorn/amber-codon-scanner@acac9ea6b4d832362e9b4d038196da5020036d14 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/CameronPiepkorn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@acac9ea6b4d832362e9b4d038196da5020036d14 -
Trigger Event:
push
-
Statement type: