Protein sequence representation: encoders, embeddings, and reductions.
Project description
Sylphy 🧬
Sylphy is a Python toolkit for turning protein sequences into machine-learning-ready representations.
It covers three main workflows:
- Classical sequence encoders: one-hot, ordinal, frequency, k-mers, physicochemical, FFT
- Embedding extraction from pretrained protein models: ESM2, ProtT5, ProtBERT, Ankh2, Mistral-Prot, ESM-C
- Dimensionality reduction for downstream analysis and visualization
Installation
Sylphy supports Python 3.11 and 3.12.
pip install sylphy
Install optional extras as needed:
embeddingsfor PyTorch and Transformers-based embedding extractionparquetfor Parquet export supportreductionsfor UMAP and related optional reducersallfor all optional runtime dependencies
The reductions extra may require a C++ compiler and Python development headers because of optional native dependencies such as ClustPy.
pip install 'sylphy[embeddings,parquet]'
pip install 'sylphy[all]'
On Debian or Ubuntu systems, install the build prerequisites with:
sudo apt-get install build-essential python3-dev
On Fedora or RHEL systems:
sudo dnf install gcc gcc-c++ python3-devel
Quick Start
Classical sequence encoding:
import pandas as pd
from sylphy.sequence_encoder import create_encoder
df = pd.DataFrame({"sequence": ["MKTAYIAKQR", "GAVLIMPFWK", "PEPTIDE"]})
encoder = create_encoder(
"one_hot", # or: ordinal, kmers, frequency, physicochemical, fft
dataset=df,
sequence_column="sequence",
)
encoder.run_process()
encoded = encoder.coded_dataset
Embedding extraction:
import pandas as pd
from sylphy.embedding_extractor import create_embedding
df = pd.DataFrame({"sequence": ["MKTAYIAKQR", "GAVLIMPFWK", "PEPTIDE"]})
embedder = create_embedding(
model_name="facebook/esm2_t6_8M_UR50D",
dataset=df,
column_seq="sequence",
name_device="cuda",
precision="fp16", # fp32, fp16, or bf16
)
embedder.run_process(batch_size=8, pool="mean") # mean, cls, or eos
embeddings = embedder.coded_dataset
embedder.export_encoder("embeddings.parquet")
Dimensionality reduction:
from sylphy.reductions import reduce_dimensionality
model, reduced = reduce_dimensionality(
method="pca", # pca, truncated_svd, umap, tsne, isomap, etc.
dataset=embeddings,
n_components=2,
random_state=42,
)
CLI
sylphy --help
sylphy get-embedding \
--model facebook/esm2_t6_8M_UR50D \
--input-data sequences.csv \
--sequence-identifier sequence \
--output embeddings.parquet \
--device cuda --precision fp16 --batch-size 16
sylphy encode-sequences \
--encoder one_hot \
--input-data sequences.csv \
--sequence-identifier sequence \
--output encoded.csv
sylphy cache stats
Configuration
By default Sylphy stores cache data in the platform cache directory:
- Linux:
~/.cache/sylphy - macOS:
~/Library/Caches/sylphy - Windows:
%LOCALAPPDATA%\\sylphy\\Cache
Useful environment variables:
SYLPHY_CACHE_ROOTto override the cache locationSYLPHY_DEVICEto forcecpuorcudaSYLPHY_MODEL_<NAME>to override a registered model path
Learn More
- DEVELOPMENT.md for local setup, tests, architecture, and contribution notes
- examples/README.md for the examples index and runnable scripts/notebooks
License
GPL-3.0-only. See LICENSE.
Acknowledgements
Built with the Hugging Face Transformers ecosystem, the Meta ESM-C SDK, and the broader scientific Python stack including scikit-learn, PyTorch, UMAP, and ClustPy.
Developed by KREN AI Lab at Universidad de Magallanes, Chile.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sylphy-0.2.0.tar.gz.
File metadata
- Download URL: sylphy-0.2.0.tar.gz
- Upload date:
- Size: 64.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0197f06f1e4fb8f5df0f2a936af5a2703f0c26ccd807e6839c81293db423198
|
|
| MD5 |
9136aa3804db5cc522af05f40e5ab19e
|
|
| BLAKE2b-256 |
147cc5de5a34378720892fec6e344695330dbb118644c9447408e7aa02acc819
|
Provenance
The following attestation bundles were made for sylphy-0.2.0.tar.gz:
Publisher:
publish-pypi.yml on kren-ai-lab/sylphy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sylphy-0.2.0.tar.gz -
Subject digest:
e0197f06f1e4fb8f5df0f2a936af5a2703f0c26ccd807e6839c81293db423198 - Sigstore transparency entry: 1319513296
- Sigstore integration time:
-
Permalink:
kren-ai-lab/sylphy@760794fa5590477bb303626ebebe219ae5278460 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/kren-ai-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@760794fa5590477bb303626ebebe219ae5278460 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sylphy-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sylphy-0.2.0-py3-none-any.whl
- Upload date:
- Size: 92.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e7fd0b3424a557437c35871e3a7d0d353c9e31e822faf30e8acd5239b7e0f0a
|
|
| MD5 |
0031c51a2b0a6f8da73d6c4b1f3ad3d5
|
|
| BLAKE2b-256 |
34fc9f36d408c985cb3e3770939653e8610c384549a9503126c8f3b8409a4a0b
|
Provenance
The following attestation bundles were made for sylphy-0.2.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on kren-ai-lab/sylphy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sylphy-0.2.0-py3-none-any.whl -
Subject digest:
7e7fd0b3424a557437c35871e3a7d0d353c9e31e822faf30e8acd5239b7e0f0a - Sigstore transparency entry: 1319513386
- Sigstore integration time:
-
Permalink:
kren-ai-lab/sylphy@760794fa5590477bb303626ebebe219ae5278460 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/kren-ai-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@760794fa5590477bb303626ebebe219ae5278460 -
Trigger Event:
push
-
Statement type: