PalmSite: RdRP catalytic center predictor
Project description
PalmSite — RdRP catalytic center predictor
PalmSite is a simple, fast command-line tool that predicts the RNA-dependent RNA polymerase (RdRP) catalytic center from protein FASTA and outputs GFF3.
Highlights
- One command from FASTA → GFF3:
palmsite <fasta ...> - High precision and recall (internal benchmarks):
| Backbone (ESM-C) | Positives vs. Negatives | Positives vs. Rest |
|---|---|---|
| 6b | 0.9998 | 0.9848 |
| 600m | 0.9992 | 0.9687 |
| 300m | 0.9991 | 0.9755 |
- Detects distant homologs (e.g., HSRV RdRP).
- Clear progress logging and fast batched embedding (local 300m/600m or Forge 6B).
Installation
pip install palmsite
Quickstart
# Basic (local 600m is default backbone)
palmsite examples/zikavirus_proteins.fasta
# Quiet mode (only errors)
palmsite -q examples/zikavirus_proteins.fasta
# Raise the reporting threshold
palmsite -p 0.9 examples/zikavirus_proteins.fasta
# Use 6B (Forge); requires a token
palmsite -b 6b -k <FORGE_TOKEN> examples/zikavirus_proteins.fasta
# or export ESM_FORGE_TOKEN and omit -k
Notes:
-b/--backbonechooses the embedding model: 300m, 600m (local), or 6b (Forge/cloud).
Command-line usage
Usage: palmsite [OPTIONS] [FASTAS]...
PalmSite — RdRP catalytic center predictor. Usage: palmsite -p 0.5 [-o
result.gff] [options] <fasta ...>
Options:
--version Show the version and exit.
-o, --gff-out TEXT Write GFF3 to this path; default: stdout if
omitted
-p, --min-p FLOAT Minimum probability to include a feature in
GFF [default: 0.5]
-b, --backbone [300m|600m|6b] Embedding backbone & size: '300m' (fast,
local), '600m' (balanced, local), '6b'
(highest quality via ESM Forge; requires
--token or ESM_FORGE_TOKEN). [default: 600m]
-m, --model-id TEXT Hugging Face model repo (default via
PALMSITE_MODEL_ID env or palmsite/<backbone>)
-d, --device [auto|cpu|cuda] Device for local ESM-C (ignored for 6B Forge)
[default: auto]
-k, --token TEXT Forge token (required for 6B if not set in
ESM_FORGE_TOKEN)
-t, --tmp-dir TEXT Optional working directory for temp files
-q, --quiet Reduce non-error logs
-v, --verbose Verbose logs (DEBUG level; overrides -q)
--keep-tmp Keep temporary files (sanitized FASTA &
embeddings.h5) for debugging
About -b/--backbone
- 300m – fast local ESM-C. Good for CPU/GPU prototyping.
- 600m – balanced local ESM-C. Better quality; still lightweight.
- 6b – highest quality via ESM Forge (cloud); requires
-k <token>orESM_FORGE_TOKEN.
What PalmSite does
- Sanitize & merge FASTA
Replaces unusual residues withX, drops sequences if too many fixes were needed, and writes one merged FASTA. - Embed sequences
- Launches the embedding engine (batched, token-aware micro-batching; visible progress/ETA).
- Backends:
- Local ESM-C (300m/600m) via Hugging Face.
- Forge (6B) via the ESM SDK (
ESM3ForgeInferenceClient).
- Predict → GFF3
Loads the checkpoint from Hugging Face, computes RdRP probabilities and spans, aggregates per protein, and writes GFF3.
Output
- GFF3 (stdout or
-o): one feature per protein (catalytic center span). Attributes includeP,sigma, original length, and the chunk used.
Environment variables
ESM_FORGE_TOKEN— Forge API token for-b 6b(alternative to-k).
Project structure (user-side)
cli.py— top-level command: sanitize → embed → predict.embed_shim.py— launches the embedding engine in a subprocess._embed_impl.py— embedding engine (batching, progress, HDF5 writer, Forge/local backends).infer_simple.py— simple driver to produce GFF from embeddings._predict_impl.py— full predictor (model, dataset, collate).hf.py— Hugging Face weight resolution.sanitize.py— FASTA cleaner/merger.__init__.py— version. Current: 0.1.0.
Version
0.1.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palmsite-0.1.0.tar.gz
(30.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
palmsite-0.1.0-py3-none-any.whl
(31.6 kB
view details)
File details
Details for the file palmsite-0.1.0.tar.gz.
File metadata
- Download URL: palmsite-0.1.0.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3028d387bc94a829473f2da9db51993e105c3c4e22b3cc1d37c7169ef00653a7
|
|
| MD5 |
d8e3856aa090ec95d193433018e1052b
|
|
| BLAKE2b-256 |
760dcd8b0de745ff3a1ab6c33b35e993fbf7df6a8c03212635028d9331cc731b
|
File details
Details for the file palmsite-0.1.0-py3-none-any.whl.
File metadata
- Download URL: palmsite-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa383a9bb24089d88da7ca22806c4c3149472d0ae0707d2ada95869e0672994c
|
|
| MD5 |
c37d7c69871ad69509f51b4130fc077c
|
|
| BLAKE2b-256 |
63449f797cbcb97f7f7c9dbab595d1ca74bd280b7d6daf89e1c605d79bbb6033
|