Extract protein embeddings from FASTA files using ESM-2 models

These details have not been verified by PyPI

Project links

Project description

ProtEmbedder

Extract protein embeddings from FASTA files using ESM-2 protein language models.

Installation

pip install -e .

Requirements: Python ≥ 3.8, PyTorch ≥ 1.12, fair-esm ≥ 2.0

Quick Start

CLI Usage

# Per-protein embeddings (default) — one vector per sequence
protembedder -m esm2_t33_650M -i proteins.fasta -o embeddings.pt

# Per-residue embeddings — one vector per amino acid
protembedder -m esm2_t33_650M -i proteins.fasta -o embeddings.pt --per-residue

# GPU with custom batch size
protembedder -m esm2_t33_650M -i proteins.fasta -o embeddings.pt --device cuda --batch-size 16

# Small model for quick testing
protembedder -m esm2_t6_8M -i proteins.fasta -o embeddings.pt -v

CLI Flags

Flag	Short	Required	Default	Description
`--model`	`-m`	Yes	—	ESM-2 model name (see table below)
`--input`	`-i`	Yes	—	Input FASTA file path
`--output`	`-o`	Yes	—	Output .pt file path
`--per-residue`	—	No	`False`	Per amino acid embeddings
`--device`	—	No	auto	`cpu`, `cuda`, `cuda:0`, etc.
`--batch-size`	—	No	`8`	Sequences per batch
`--verbose`	`-v`	No	`False`	Verbose logging

Available Models

Model	Parameters	Embedding Dim	Layers
`esm2_t6_8M`	8M	320	6
`esm2_t12_35M`	35M	480	12
`esm2_t30_150M`	150M	640	30
`esm2_t33_650M`	650M	1280	33
`esm2_t36_3B`	3B	2560	36
`esm2_t48_15B`	15B	5120	48

Python API

import torch
from protembedder import ProteinEmbedder

# Initialize
embedder = ProteinEmbedder("esm2_t33_650M", device="cuda")

# From FASTA file
embeddings = embedder.embed_fasta("proteins.fasta", per_residue=False)

# From sequence list
sequences = [
    ("protein_1", "MKTAYIAKQRQISFVKSH"),
    ("protein_2", "MDEVLQAELPAEG"),
]
embeddings = embedder.embed_sequences(sequences, per_residue=True, batch_size=4)

# Save / Load
torch.save(embeddings, "embeddings.pt")
loaded = torch.load("embeddings.pt")

Output Format

The output .pt file contains a Python dict: {header: tensor}.

Per-protein (default): tensor shape is (embed_dim,)
Per-residue (--per-residue): tensor shape is (seq_len, embed_dim)

emb = torch.load("embeddings.pt")
for name, tensor in emb.items():
    print(f"{name}: {tensor.shape}")
# protein_1: torch.Size([1280])        # per-protein
# protein_1: torch.Size([18, 1280])    # per-residue

OOM Handling

If a batch causes an out-of-memory error on GPU, the package automatically falls back to processing sequences one at a time for that batch. You can also reduce --batch-size manually.

Reference

Lin, Z., et al. "Evolutionary-scale prediction of atomic-level protein structure with a language model." Science 379.6637 (2023): 1123-1130. https://doi.org/10.1126/science.ade2574

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Mar 16, 2026

0.3.0

Mar 16, 2026

0.2.0

Mar 16, 2026

This version

0.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protembedder-0.1.0.tar.gz (9.7 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

protembedder-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file protembedder-0.1.0.tar.gz.

File metadata

Download URL: protembedder-0.1.0.tar.gz
Upload date: Mar 16, 2026
Size: 9.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for protembedder-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`03d0d21c9e5f222d99a974a896c50aa2d22b0382de756cdf065455dbd2d8660f`
MD5	`0af984a36c6816a873811a4f7e3a3660`
BLAKE2b-256	`7dbf4f87ced0fe67359dad29d4b61a7cf92c27ca784c5fbc96402cd019a19e77`

See more details on using hashes here.

File details

Details for the file protembedder-0.1.0-py3-none-any.whl.

File metadata

Download URL: protembedder-0.1.0-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for protembedder-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4560bd3ba5a54d45801c7bfdc57a3c43ec964ea7a7daa1fb218a28ecbfa6aa51`
MD5	`34d852870a5f9b3e18720a65be8f08f5`
BLAKE2b-256	`2ec2492e100b2a06d906c45bdf2badffb032e0a4bdd278a13fd067d1d873eca9`

See more details on using hashes here.

protembedder 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ProtEmbedder

Installation

Quick Start

CLI Usage

CLI Flags

Available Models

Python API

Output Format

OOM Handling

Reference

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes