A visualisation tool for protein embeddings from pLMs
Project description
ProtSpace
ProtSpace is a powerful visualization tool for exploring protein embeddings and structures. It allows users to interactively visualize high-dimensional protein language model data in 2D or 3D space, color-code proteins based on various features, and view protein structures when available.
Table of Contents
Quick Start with Google Colab
Try ProtSpace instantly using our Google Colab notebook:
The notebook demonstrates:
- Installation and setup
- Data preparation
- Basic visualization
Example Outputs
2D Scatter Plot (SVG)
3D Interactive Plot
Installation
Using uv:
# Quick run
uvx protspace
# Permanent installation
uv tool install protspace
uv tool update-shell
# Latest GitHub version
uv tool install git+https://github.com/tsenoner/ProtSpace.git
uv tool update-shell
Usage
Data Preparation
uvx --from protspace protspace-json -i embeddings.h5 -m features.csv -o output.json --methods pca3 umap2 tsne2
Running ProtSpace
protspace output.json [--pdb_zip pdb_files.zip] [--port 8050]
Access the interface at http://localhost:8050
Features
- Interactive 2D/3D visualization (PCA, UMAP, t-SNE)
- Feature-based coloring and marker styling
- Protein structure visualization (with PDB files)
- Search and highlight functionality
- High-quality plot exports
- Responsive web interface
Data Preparation
The protspace-json command supports:
Required Arguments
-i, --input: HDF file (.h5) or similarity matrix (.csv)-m, --metadata: CSV file with features-o, --output: Output JSON path--methods: Reduction methods (e.g., pca2, tsne3, umap2)
Optional Arguments
--custom_names: Custom projection names (e.g., pca2=PCA_2D)--verbose: Increase output verbosity
Method-Specific Parameters
- UMAP:
--n_neighbors: Number of neighbors (default: 15)--min_dist: Minimum distance (default: 0.1)
- t-SNE:
--perplexity: Perplexity value (default: 30)--learning_rate: Learning rate (default: 200)
- PaCMAP:
--mn_ratio: MN ratio (default: 0.5)--fp_ratio: FP ratio (default: 2.0)
- MDS:
--n_init: Number of initializations (default: 4)--max_iter: Maximum iterations (default: 300)
Custom Feature Styling
Use protspace-feature-colors to customize feature appearance:
protspace-feature-colors input.json output.json --feature_styles '{
"feature_name": {
"colors": {
"value1": "#FF0000",
"value2": "#00FF00"
},
"shapes": {
"value1": "circle",
"value2": "square"
}
}
}'
Available shapes: circle, circle-open, cross, diamond, diamond-open, square, square-open, x
File Formats
Input
-
Embeddings/Similarity
- HDF5 (.h5) for embeddings
- CSV for similarity matrix
-
Metadata
- CSV with 'identifier' column
- Additional columns for features
-
Structures
- ZIP containing PDB/CIF files
- Filenames match identifiers (dots replaced with underscores)
Output
- JSON containing:
- Protein features
- Projection coordinates
- Visualization state (colors, shapes)
- Structure references
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file protspace-0.14.7.tar.gz.
File metadata
- Download URL: protspace-0.14.7.tar.gz
- Upload date:
- Size: 68.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c70d36df33445a833781036a82e6a90bc68c3e92d82ab7059a3d3ae797e793e8
|
|
| MD5 |
22f0a65ae023f10ac8e1abd932f7df23
|
|
| BLAKE2b-256 |
351ce166604a8abd8bd17212043c0289905d7795262db4f87a1cc211fa23c18f
|
Provenance
The following attestation bundles were made for protspace-0.14.7.tar.gz:
Publisher:
python.yml on tsenoner/ProtSpace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protspace-0.14.7.tar.gz -
Subject digest:
c70d36df33445a833781036a82e6a90bc68c3e92d82ab7059a3d3ae797e793e8 - Sigstore transparency entry: 152012846
- Sigstore integration time:
-
Permalink:
tsenoner/ProtSpace@13be7bc750d98e3f77c91dd2aa3c94a9a84fdf49 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tsenoner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@13be7bc750d98e3f77c91dd2aa3c94a9a84fdf49 -
Trigger Event:
repository_dispatch
-
Statement type:
File details
Details for the file protspace-0.14.7-py3-none-any.whl.
File metadata
- Download URL: protspace-0.14.7-py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fa6ee051f3fe48e5b779e18b9ca75ac586ee5ea83f0eb4893d890e24ebc97ee
|
|
| MD5 |
9af599fd1817533313b0798ff0155871
|
|
| BLAKE2b-256 |
b1354514d000763b6b26d91a036ce052469ce68f33cb7e86d3de860fcf0ec87d
|
Provenance
The following attestation bundles were made for protspace-0.14.7-py3-none-any.whl:
Publisher:
python.yml on tsenoner/ProtSpace
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
protspace-0.14.7-py3-none-any.whl -
Subject digest:
1fa6ee051f3fe48e5b779e18b9ca75ac586ee5ea83f0eb4893d890e24ebc97ee - Sigstore transparency entry: 152012850
- Sigstore integration time:
-
Permalink:
tsenoner/ProtSpace@13be7bc750d98e3f77c91dd2aa3c94a9a84fdf49 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/tsenoner
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python.yml@13be7bc750d98e3f77c91dd2aa3c94a9a84fdf49 -
Trigger Event:
repository_dispatch
-
Statement type: