A visualisation tool for protein embeddings from pLMs
Project description
ProtSpace
ProtSpace is a visualization tool for exploring protein embeddings or similarity matrix along their 3D protein structures. It allows users to interactively visualize high-dimensional protein language model data in 2D or 3D space, color-code proteins based on various features, and view protein structures when available.
Web Interface
Try ProtSpace directly in your browser without installation: https://protspace.rostlab.org/
Quick Start with Google Colab
Try ProtSpace instantly using our Google Colab notebooks:
Note: Some Google Colab functionalities may not work properly in Safari browsers. For the best experience, we recommend using Chrome or Firefox.
Table of Contents
- ProtSpace
Example Outputs
2D Scatter Plot (SVG)
3D Interactive Plot
Installation
There are two installation options:
- Basic Installation (dimensionality reduction only):
pip install protspace
- Full Installation (including visualization interface):
pip install "protspace[frontend]"
Usage
UniProt Query
Search and analyze proteins directly from UniProt using exact UniProt query syntax:
# Human insulin
protspace-query -q "insulin AND organism_id:9606 AND reviewed:true" -o output_dir --methods pca3,umap2,tsne2
# All kinases from human with legacy format (non binary files)
protspace-query -q "kinase AND organism_id:9606" -o kinases_dir --methods umap2,tsne3 --non-binary
# Toxins from any organism (keeping temporary files)
protspace-query -q "toxin AND reviewed:true" -o toxins_dir --methods pca2,umap3 --keep-tmp
Data Preparation
Process local embeddings or similarity matrices:
protspace-local -i embeddings.h5 -m features.csv -o output.json --methods pca3,umap2,tsne2
Running protspace
protspace --json output.json [--pdb_zip pdb_files.zip] [--port 8050]
Access the interface at http://localhost:8050
Features
- Interactive 2D/3D visualization with multiple dimensionality reduction methods:
- Principal Component Analysis (PCA)
- Multidimensional Scaling (MDS)
- Uniform Manifold Approximation and Projection (UMAP)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Pairwise Controlled Manifold Approximation (PaCMAP)
- Feature-based coloring and marker styling
- Protein structure visualization (with PDB files)
- Search and highlight functionality
- High-quality plot exports (SVG for 2D, interactive HTML for 3D)
- Responsive web interface
Data Preparation
ProtSpace supports multiple data preparation methods:
UniProt Query Processing
The protspace-query command searches UniProt and processes results automatically:
Required Arguments
-q, --query: UniProt search query with exact UniProt syntax (e.g., 'insulin AND organism_id:9606 AND reviewed:true')-o, --output: Output directory--methods: Comma-separated reduction methods (e.g., pca2,tsne3,umap2,pacmap2,mds2)
Optional Arguments
--non-binary: Not to use binary formats (legacy mode)-m, --metadata: Features to extract (comma-separated list, e.g., 'annotation_score,genus,protein_existence') default to all the available features.--keep-tmp: keeps the temporary files--verbose: Increase output verbosity
Local Data Processing
The protspace-local command supports:
Required Arguments
-i, --input: HDF file (.h5) or similarity matrix (.csv)-m, --metadata: CSV file with features (first column must be named "identifier" and match IDs in HDF5/similarity matrix) or comma-separated features, which will be fetched automatically.-o, --output: Output directory--methods: Comma-separated reduction methods (e.g., pca2,tsne3,umap2,pacmap2,mds2)
Optional Arguments
--non-binary: Not to use binary formats (legacy mode)--delimiter: Specify delimiter for metadata file (default: comma)--custom_names: Custom projection names (e.g., pca2=PCA_2D)--verbose: Increase output verbosity
Method-Specific Parameters
Both protspace-query and protspace-local support the following reduction method parameters:
- UMAP:
--n_neighbors: Number of neighbors (default: 15)--min_dist: Minimum distance (default: 0.1)
- t-SNE:
--perplexity: Perplexity value (default: 30)--learning_rate: Learning rate (default: 200)
- PaCMAP:
--mn_ratio: MN ratio (default: 0.5)--fp_ratio: FP ratio (default: 2.0)
- MDS:
--n_init: Number of initializations (default: 4)--max_iter: Maximum iterations (default: 300)--eps: Convergence tolerance (default: 1e-3)
Custom Feature Styling
Use protspace-feature-colors to customize feature appearance:
protspace-feature-colors input.json output.json --feature_styles '{
"feature_name": {
"colors": {
"value1": "#FF0000",
"value2": "#00FF00"
},
"shapes": {
"value1": "circle",
"value2": "square"
}
}
}'
Available shapes: circle, circle-open, cross, diamond, diamond-open, square, square-open, x
File Formats
Input
- UniProt Query (for
protspace-query)
- UniProt search query with exact syntax (e.g., 'insulin AND organism_id:9606 AND reviewed:true')
- Automatically downloads FASTA sequences
- Generates similarity matrix using pymmseqs
- Fetches UniProt features automatically
- Local Embeddings/Similarity (for
protspace-local)
- HDF5 (.h5) for embeddings
- CSV for similarity matrix
- Metadata (for
protspace-local)
- CSV with mandatory 'identifier' column matching IDs in embeddings/similarity data
- Additional columns for features
- Structures (optional)
- ZIP containing PDB/CIF files
- Filenames match identifiers (dots replaced with underscores)
Output
protspace-query
-
Directory of parquet files:
- projections_data.parquet
- projections_metadata.parquet
- selected_features.parquet
- These are selected features specified using
-moption, if not using-moption, it is exactly the all_features.parquet file.
- These are selected features specified using
- if used
--keep-tmpflag the files below are also included:- all_features.parquet (fetched from UniProt)
- sequences.fasta (fetched from UniProt)
- similarity_matrix.csv (generated by PyMMseqs)
-
With
--non-binaryflag (legacy version):- selected_features_projections.json (containes selected features and projections data)
- if used
--keep-tmpflag the files below are also included:- all_features.csv
- sequences.fasta
- similarity_matrix.csv
protspace-local
-
Directory of parquet files:
- projections_data.parquet
- projections_metadata.parquet
- selected_features.parquet
- These are selected features specified using
-moption, if not using-moption, it is exactly the all_features.parquet file.
- These are selected features specified using
- if used
--keep-tmpflag the files below are also included:- all_features.parquet (fetched from UniProt)
-
With
--non-binaryflag (legacy version):- selected_features_projections.json (containes selected features and projections data)
- if used
--keep-tmpflag the files below are also included:- all_features.csv
Citation
If you use ProtSpace in your research, please cite:
@article{SENONER2025168940,
title = {ProtSpace: A Tool for Visualizing Protein Space},
journal = {Journal of Molecular Biology},
pages = {168940},
year = {2025},
issn = {0022-2836},
doi = {https://doi.org/10.1016/j.jmb.2025.168940},
url = {https://www.sciencedirect.com/science/article/pii/S0022283625000063},
author = {Tobias Senoner and Tobias Olenyi and Michael Heinzinger and Anton Spannagl and George Bouras and Burkhard Rost and Ivan Koludarov}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protspace-2.1.0.tar.gz.
File metadata
- Download URL: protspace-2.1.0.tar.gz
- Upload date:
- Size: 98.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e1934d5486cecae4bc91b3a76fcf9d3c5b7d5e998c1275a62d16de8607ed05f
|
|
| MD5 |
f5bcf77862f8edf2db826184c9588a3c
|
|
| BLAKE2b-256 |
baa04ef1d8eefad2e3891314087419dd6fa763d7b7b17bebef7247f631791a63
|
Provenance
The following attestation bundles were made for protspace-2.1.0.tar.gz:
Publisher:
python.yml on tsenoner/protspace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protspace-2.1.0.tar.gz -
Subject digest:
4e1934d5486cecae4bc91b3a76fcf9d3c5b7d5e998c1275a62d16de8607ed05f - Sigstore transparency entry: 262888933
- Sigstore integration time:
-
Permalink:
tsenoner/protspace@efd08df3e1a641528986a22d7939fd8025af86c1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tsenoner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@efd08df3e1a641528986a22d7939fd8025af86c1 -
Trigger Event:
repository_dispatch
-
Statement type:
File details
Details for the file protspace-2.1.0-py3-none-any.whl.
File metadata
- Download URL: protspace-2.1.0-py3-none-any.whl
- Upload date:
- Size: 640.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d224a13f2c0efec08d43d14f05e24bdef501bb8b065e4a6d8667fc169596048
|
|
| MD5 |
b2932d4e0b40237a503ba178f512904d
|
|
| BLAKE2b-256 |
e3b715bea613bf8ccbb6ca75a571d9fb92ed747095fc51f773f13bfc436bf081
|
Provenance
The following attestation bundles were made for protspace-2.1.0-py3-none-any.whl:
Publisher:
python.yml on tsenoner/protspace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protspace-2.1.0-py3-none-any.whl -
Subject digest:
7d224a13f2c0efec08d43d14f05e24bdef501bb8b065e4a6d8667fc169596048 - Sigstore transparency entry: 262888940
- Sigstore integration time:
-
Permalink:
tsenoner/protspace@efd08df3e1a641528986a22d7939fd8025af86c1 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tsenoner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@efd08df3e1a641528986a22d7939fd8025af86c1 -
Trigger Event:
repository_dispatch
-
Statement type: