A visualisation tool for protein embeddings from pLMs
Project description
ProtSpace
ProtSpace is a visualization tool for exploring protein embeddings or similarity matrix along their 3D protein structures. It allows users to interactively visualize high-dimensional protein language model data in 2D or 3D space, color-code proteins based on various features, and view protein structures when available.
Web Interface
Try ProtSpace directly in your browser without installation: https://protspace.rostlab.org/
Quick Start with Google Colab
Try ProtSpace instantly using our Google Colab notebooks:
Note: Some Google Colab functionalities may not work properly in Safari browsers. For the best experience, we recommend using Chrome or Firefox.
Table of Contents
- ProtSpace
Example Outputs
2D Scatter Plot (SVG)
3D Interactive Plot
Installation
There are two installation options:
- Basic Installation (dimensionality reduction only):
pip install protspace
- Full Installation (including visualization interface):
pip install "protspace[frontend]"
Usage
Data Preparation
protspace-json -i embeddings.h5 -m features.csv -o output.json --methods pca3 umap2 tsne2
Running protspace
protspace --json output.json [--pdb_zip pdb_files.zip] [--port 8050]
Access the interface at http://localhost:8050
Features
- Interactive 2D/3D visualization with multiple dimensionality reduction methods:
- Principal Component Analysis (PCA)
- Multidimensional Scaling (MDS)
- Uniform Manifold Approximation and Projection (UMAP)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Pairwise Controlled Manifold Approximation (PaCMAP)
- Feature-based coloring and marker styling
- Protein structure visualization (with PDB files)
- Search and highlight functionality
- High-quality plot exports (SVG for 2D, interactive HTML for 3D)
- Responsive web interface
Data Preparation
The protspace-json command supports:
Required Arguments
-i, --input: HDF file (.h5) or similarity matrix (.csv)-m, --metadata: CSV file with features (first column must be named "identifier" and match IDs in HDF5/similarity matrix)-o, --output: Output JSON path--methods: Reduction methods (e.g., pca2, tsne3, umap2, pacmap2, mds2)
Optional Arguments
--delimiter: Specify delimiter for metadata file (default: comma)--custom_names: Custom projection names (e.g., pca2=PCA_2D)--verbose: Increase output verbosity
Method-Specific Parameters
- UMAP:
--n_neighbors: Number of neighbors (default: 15)--min_dist: Minimum distance (default: 0.1)
- t-SNE:
--perplexity: Perplexity value (default: 30)--learning_rate: Learning rate (default: 200)
- PaCMAP:
--mn_ratio: MN ratio (default: 0.5)--fp_ratio: FP ratio (default: 2.0)
- MDS:
--n_init: Number of initializations (default: 4)--max_iter: Maximum iterations (default: 300)--eps: Convergence tolerance (default: 1e-3)
Custom Feature Styling
Use protspace-feature-colors to customize feature appearance:
protspace-feature-colors input.json output.json --feature_styles '{
"feature_name": {
"colors": {
"value1": "#FF0000",
"value2": "#00FF00"
},
"shapes": {
"value1": "circle",
"value2": "square"
}
}
}'
Available shapes: circle, circle-open, cross, diamond, diamond-open, square, square-open, x
File Formats
Input
- Embeddings/Similarity
- HDF5 (.h5) for embeddings
- CSV for similarity matrix
- Metadata
- CSV with mandatory 'identifier' column matching IDs in embeddings/similarity data
- Additional columns for features
- Structures
- ZIP containing PDB/CIF files
- Filenames match identifiers (dots replaced with underscores)
Output
- JSON containing:
- Protein features
- Projection coordinates
- Visualization state (colors, shapes)
Citation
If you use ProtSpace in your research, please cite:
@article{SENONER2025168940,
title = {ProtSpace: A Tool for Visualizing Protein Space},
journal = {Journal of Molecular Biology},
pages = {168940},
year = {2025},
issn = {0022-2836},
doi = {https://doi.org/10.1016/j.jmb.2025.168940},
url = {https://www.sciencedirect.com/science/article/pii/S0022283625000063},
author = {Tobias Senoner and Tobias Olenyi and Michael Heinzinger and Anton Spannagl and George Bouras and Burkhard Rost and Ivan Koludarov}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protspace-2.0.1.tar.gz.
File metadata
- Download URL: protspace-2.0.1.tar.gz
- Upload date:
- Size: 98.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d65583a9e03869b734582c654d95a871349c4b85ed92e2c757a14c4af937169c
|
|
| MD5 |
d6ab081febaf3c29bd4dfb723b3d9560
|
|
| BLAKE2b-256 |
1c0949703f6b040e1dee0cf3d47056c991e61cf6ec94f8c09d124991ad2f98b1
|
Provenance
The following attestation bundles were made for protspace-2.0.1.tar.gz:
Publisher:
python.yml on tsenoner/protspace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protspace-2.0.1.tar.gz -
Subject digest:
d65583a9e03869b734582c654d95a871349c4b85ed92e2c757a14c4af937169c - Sigstore transparency entry: 238747728
- Sigstore integration time:
-
Permalink:
tsenoner/protspace@eb0bb828c2264ebcd34ae2da2249435e19702541 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tsenoner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@eb0bb828c2264ebcd34ae2da2249435e19702541 -
Trigger Event:
repository_dispatch
-
Statement type:
File details
Details for the file protspace-2.0.1-py3-none-any.whl.
File metadata
- Download URL: protspace-2.0.1-py3-none-any.whl
- Upload date:
- Size: 618.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2a658395c514e0bed93d993cff1e75b876f81a947917482d150a44a08d465fd
|
|
| MD5 |
79ad532cf55819a9f4bda0b8e07036e1
|
|
| BLAKE2b-256 |
33eb12f0f2a48a5a8d1b86a97870545dc348664e26439db3ef35c56c85cb5951
|
Provenance
The following attestation bundles were made for protspace-2.0.1-py3-none-any.whl:
Publisher:
python.yml on tsenoner/protspace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protspace-2.0.1-py3-none-any.whl -
Subject digest:
a2a658395c514e0bed93d993cff1e75b876f81a947917482d150a44a08d465fd - Sigstore transparency entry: 238747730
- Sigstore integration time:
-
Permalink:
tsenoner/protspace@eb0bb828c2264ebcd34ae2da2249435e19702541 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tsenoner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@eb0bb828c2264ebcd34ae2da2249435e19702541 -
Trigger Event:
repository_dispatch
-
Statement type: