Skip to main content

PalmSite: RdRP catalytic center predictor

Project description

PalmSite — RdRP catalytic center predictor

PalmSite is a simple, fast command-line tool that predicts the RNA-dependent RNA polymerase (RdRP) catalytic center from protein FASTA and outputs GFF3.

Highlights

  • One command from FASTA → GFF3: palmsite <fasta ...>
  • High precision and recall (internal benchmarks):
Backbone (ESM-C) Positives vs. Negatives Positives vs. Rest
6b 0.9998 0.9848
600m 0.9992 0.9687
300m 0.9991 0.9755
  • Detects distant homologs (e.g., HSRV RdRP(Urayama et al., 2024)).

Installation

pip install palmsite

Quickstart

# Basic (local 600m is default backbone)
palmsite -o hsrv_rdrp-domain.gff examples/hsrv_proteins.fasta
#or
palmsite examples/hsrv_proteins.fasta > hsrv_rdrp-domain.gff

# Quiet mode (only errors)
palmsite -q examples/sars-cov-2_proteins.fasta

# Raise the reporting threshold (default 0.5)
palmsite -p 0.9 examples/zikavirus_proteins.fasta

# Use 6B (Forge); requires a token
palmsite -b 6b -k <FORGE_TOKEN> examples/turnip-mosaic-virus_proteins.fasta
# or export ESM_FORGE_TOKEN and omit -k

Notes:

  • -b/--backbone chooses the embedding model: 300m, 600m (local), or 6b (Forge/cloud).

Command-line usage

Usage: palmsite [OPTIONS] [FASTAS]...

  PalmSite — RdRP catalytic center predictor. Usage: palmsite -p 0.5 [-o
  result.gff] [options] <fasta ...>

Options:
  --version                      Show the version and exit.
  -o, --gff-out TEXT             Write GFF3 to this path; default: stdout if
                                 omitted
  -p, --min-p FLOAT              Minimum probability to include a feature in
                                 GFF  [default: 0.5]
  -b, --backbone [300m|600m|6b]  Embedding backbone & size: '300m' (fast,
                                 local), '600m' (balanced, local), '6b'
                                 (highest quality via ESM Forge; requires
                                 --token or ESM_FORGE_TOKEN).  [default: 600m]
  -m, --model-id TEXT            Hugging Face model repo (default via
                                 PALMSITE_MODEL_ID env or palmsite/<backbone>)
  -d, --device [auto|cpu|cuda]   Device for local ESM-C (ignored for 6B Forge)
                                 [default: auto]
  -k, --token TEXT               Forge token (required for 6B if not set in
                                 ESM_FORGE_TOKEN)
  -t, --tmp-dir TEXT             Optional working directory for temp files
  -q, --quiet                    Reduce non-error logs
  -v, --verbose                  Verbose logs (DEBUG level; overrides -q)
  --keep-tmp                     Keep temporary files (sanitized FASTA &
                                 embeddings.h5) for debugging

About -b/--backbone

  • 300m – fast local ESM-C. Good for CPU/GPU prototyping.
  • 600m – balanced local ESM-C. Better quality; still lightweight.
  • 6b – highest quality via ESM Forge (cloud); requires -k <token> or ESM_FORGE_TOKEN.

What PalmSite does

  1. Sanitize & merge FASTA
    Replaces unusual residues with X, drops sequences if too many fixes were needed, and writes one merged FASTA.
  2. Embed sequences
    • Launches the embedding engine.
    • Backends:
      • Local ESM-C (300m/600m) via Hugging Face.
      • Forge (6B) via the ESM SDK (ESM3ForgeInferenceClient).
  3. Predict → GFF3
    Loads the checkpoint from Hugging Face, computes RdRP probabilities and spans, aggregates per protein, and writes GFF3.

Output

  • GFF3 (stdout or -o): one feature per protein (catalytic center span). Attributes include P, sigma, original length, and the chunk used.

Environment variables

  • ESM_FORGE_TOKEN — Forge API token for -b 6b (alternative to -k).

Project structure (user-side)

  • cli.py — top-level command: sanitize → embed → predict.
  • embed_shim.py — launches the embedding engine in a subprocess.
  • _embed_impl.py — embedding engine (batching, progress, HDF5 writer, Forge/local backends).
  • infer_simple.py — simple driver to produce GFF from embeddings.
  • _predict_impl.py — full predictor (model, dataset, collate).
  • hf.py — Hugging Face weight resolution.
  • sanitize.py — FASTA cleaner/merger.
  • __init__.py — version. Current: 0.1.0.

Version

0.1.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palmsite-0.1.1.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

palmsite-0.1.1-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file palmsite-0.1.1.tar.gz.

File metadata

  • Download URL: palmsite-0.1.1.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for palmsite-0.1.1.tar.gz
Algorithm Hash digest
SHA256 345165a1ac0e7d53fec6304265b6ef8dfaa1e585fe955ea8f8bbbac4882fa19a
MD5 4f7427fe0c91deb716851dcb3062b279
BLAKE2b-256 a95d258cff777e00b4a02832aefbd1c2c2eb3cc5cd00df032a5f71b8aeb8b124

See more details on using hashes here.

File details

Details for the file palmsite-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: palmsite-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for palmsite-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6a0cefded153a2831877ea6a2997a29b79c01d3d14888a0cd8db58253d0ec485
MD5 26f7a26ac20ff4e433bcb3105cfcbb74
BLAKE2b-256 48d8a4f0e468b4f1edd9539b066d4a55ed5b9e4ab0eeb9e2d1cf8b9ab76d71b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page