PalmSite: RdRP catalytic center predictor
Project description
PalmSite — RdRP catalytic center predictor
PalmSite is a simple, fast command-line tool that predicts the RNA-dependent RNA polymerase (RdRP) catalytic center from protein FASTA and outputs GFF3.
Highlights
- One command from FASTA → GFF3:
palmsite <fasta ...> - High precision and recall (internal benchmarks):
| Backbone (ESM-C) | Positives vs. Negatives | Positives vs. Rest |
|---|---|---|
| 6b | 0.9998 | 0.9848 |
| 600m | 0.9992 | 0.9687 |
| 300m | 0.9991 | 0.9755 |
- Detects distant homologs (e.g., HSRV RdRP(Urayama et al., 2024)).
Installation
pip install palmsite
Quickstart
# Basic (local 600m is default backbone)
palmsite -o hsrv_rdrp-domain.gff examples/hsrv_proteins.fasta
#or
palmsite examples/hsrv_proteins.fasta > hsrv_rdrp-domain.gff
# Quiet mode (only errors)
palmsite -q examples/sars-cov-2_proteins.fasta
# Raise the reporting threshold (default 0.5)
palmsite -p 0.9 examples/zikavirus_proteins.fasta
# Use 6B (Forge); requires a token
palmsite -b 6b -k <FORGE_TOKEN> examples/turnip-mosaic-virus_proteins.fasta
# or export ESM_FORGE_TOKEN and omit -k
Notes:
-b/--backbonechooses the embedding model: 300m, 600m (local), or 6b (Forge/cloud).
Command-line usage
Usage: palmsite [OPTIONS] [FASTAS]...
PalmSite — RdRP catalytic center predictor. Usage: palmsite -p 0.5 [-o
result.gff] [options] <fasta ...>
Options:
--version Show the version and exit.
-o, --gff-out TEXT Write GFF3 to this path; default: stdout if
omitted
-p, --min-p FLOAT Minimum probability to include a feature in
GFF [default: 0.5]
-b, --backbone [300m|600m|6b] Embedding backbone & size: '300m' (fast,
local), '600m' (balanced, local), '6b'
(highest quality via ESM Forge; requires
--token or ESM_FORGE_TOKEN). [default: 600m]
-m, --model-id TEXT Hugging Face model repo (default via
PALMSITE_MODEL_ID env or palmsite/<backbone>)
-d, --device [auto|cpu|cuda] Device for local ESM-C (ignored for 6B Forge)
[default: auto]
-k, --token TEXT Forge token (required for 6B if not set in
ESM_FORGE_TOKEN)
-t, --tmp-dir TEXT Optional working directory for temp files
-q, --quiet Reduce non-error logs
-v, --verbose Verbose logs (DEBUG level; overrides -q)
--keep-tmp Keep temporary files (sanitized FASTA &
embeddings.h5) for debugging
About -b/--backbone
- 300m – fast local ESM-C. Good for CPU/GPU prototyping.
- 600m – balanced local ESM-C. Better quality; still lightweight.
- 6b – highest quality via ESM Forge (cloud); requires
-k <token>orESM_FORGE_TOKEN.
What PalmSite does
- Sanitize & merge FASTA
Replaces unusual residues withX, drops sequences if too many fixes were needed, and writes one merged FASTA. - Embed sequences
- Launches the embedding engine.
- Backends:
- Local ESM-C (300m/600m) via Hugging Face.
- Forge (6B) via the ESM SDK (
ESM3ForgeInferenceClient).
- Predict → GFF3
Loads the checkpoint from Hugging Face, computes RdRP probabilities and spans, aggregates per protein, and writes GFF3.
Output
- GFF3 (stdout or
-o): one feature per protein (catalytic center span). Attributes includeP,sigma, original length, and the chunk used.
Environment variables
ESM_FORGE_TOKEN— Forge API token for-b 6b(alternative to-k).
Project structure (user-side)
cli.py— top-level command: sanitize → embed → predict.embed_shim.py— launches the embedding engine in a subprocess._embed_impl.py— embedding engine (batching, progress, HDF5 writer, Forge/local backends).infer_simple.py— simple driver to produce GFF from embeddings._predict_impl.py— full predictor (model, dataset, collate).hf.py— Hugging Face weight resolution.sanitize.py— FASTA cleaner/merger.__init__.py— version. Current: 0.1.0.
Version
0.1.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
palmsite-0.1.1.tar.gz
(30.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
palmsite-0.1.1-py3-none-any.whl
(31.6 kB
view details)
File details
Details for the file palmsite-0.1.1.tar.gz.
File metadata
- Download URL: palmsite-0.1.1.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
345165a1ac0e7d53fec6304265b6ef8dfaa1e585fe955ea8f8bbbac4882fa19a
|
|
| MD5 |
4f7427fe0c91deb716851dcb3062b279
|
|
| BLAKE2b-256 |
a95d258cff777e00b4a02832aefbd1c2c2eb3cc5cd00df032a5f71b8aeb8b124
|
File details
Details for the file palmsite-0.1.1-py3-none-any.whl.
File metadata
- Download URL: palmsite-0.1.1-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a0cefded153a2831877ea6a2997a29b79c01d3d14888a0cd8db58253d0ec485
|
|
| MD5 |
26f7a26ac20ff4e433bcb3105cfcbb74
|
|
| BLAKE2b-256 |
48d8a4f0e468b4f1edd9539b066d4a55ed5b9e4ab0eeb9e2d1cf8b9ab76d71b6
|