Skip to main content

A visualisation tool for protein embeddings from pLMs

Project description

ProtSpace

PyPI version Python 3.10+ License: GPL v3 Downloads DOI

ProtSpace is a visualization tool for exploring protein embeddings or similarity matrices. It projects high-dimensional protein language model data into 2D space, color-codes proteins by biological annotations, and exports publication-ready figures.

  • Multiple projections: PCA, UMAP, t-SNE, MDS, PaCMAP, LocalMAP
  • Automatic annotations: UniProt, InterPro, and Taxonomy
  • Structure viewer: Integrated protein structure visualization
  • Export: PNG, PDF, SVG, HTML

🌐 Try Online

ProtSpace Web: Fast 2D explorer optimized for large datasets — drag & drop .parquetbundle files (source)

🚀 Google Colab Notebooks

Note: Use Chrome or Firefox for best experience.

  1. Generate Protein Embeddings: Open Embeddings In Colab

  2. Prepare ProtSpace Bundle: Open Preparation In Colab

📦 Installation

pip install protspace

🎯 Quick Start

1. Prepare data

# From HDF5 embeddings
protspace prepare -i embeddings.h5 -m pca2,umap2 -o output

# From FASTA (auto-embeds via Biocentral API)
protspace prepare -i sequences.fasta -e prot_t5 -m pca2 -o output

# Multi-model comparison (12 pLMs supported)
protspace prepare -i sequences.fasta -e prot_t5,esm2_650m,ankh_base -m pca2,umap2 -o output

# Combine datasets (same embedding name → proteins are unioned)
protspace prepare -i species_a.h5:prot_t5 -i species_b.h5:prot_t5 -m umap2 -o output

2. Explore results

Upload the generated .parquetbundle file at protspace.app/explore.

3. Power-user workflow (individual steps)

protspace embed -i sequences.fasta -e prot_t5 -e esm2_3b -o embeddings/
protspace project -i embeddings/prot_t5.h5 -i embeddings/esm2_3b.h5 -m pca2,umap2 -o projections/
protspace annotate -i embeddings/prot_t5.h5 -a default -o annotations.parquet
protspace bundle -p projections/ -a annotations.parquet -o output.parquetbundle

📊 Example Output

2D Example

✨ Annotations

Use -a to color-code proteins by UniProt, InterPro, or Taxonomy annotations. Groups (default, all, uniprot, interpro, taxonomy) and individual names can be mixed freely. If -a is omitted, the default group is used.

protspace prepare -i data.h5 -m pca2                              # default annotations
protspace prepare -i data.h5 -a default,interpro,kingdom -m pca2  # mix groups + individual

📖 Documentation

📝 Citation

Senoner T, Olenyi T, Heinzinger M, Spannagl A, Bouras G, Rost B, Koludarov I. ProtSpace: A Tool for Visualizing Protein Space. Journal of Molecular Biology, 168940, 2025. doi:10.1016/j.jmb.2025.168940

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

protspace-4.4.0.tar.gz (10.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

protspace-4.4.0-py3-none-any.whl (708.9 kB view details)

Uploaded Python 3

File details

Details for the file protspace-4.4.0.tar.gz.

File metadata

  • Download URL: protspace-4.4.0.tar.gz
  • Upload date:
  • Size: 10.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for protspace-4.4.0.tar.gz
Algorithm Hash digest
SHA256 ed0d5de09fb80b496f08629b00b6a40f8650653c0258c1b8c97fdd76306661da
MD5 51ba62e04553da9776bf6577734fb058
BLAKE2b-256 8dfda221093b9ab4dd167e9e30e7fa6e2824491b8581a7fe327f14854ce4ad08

See more details on using hashes here.

Provenance

The following attestation bundles were made for protspace-4.4.0.tar.gz:

Publisher: publish.yml on tsenoner/protspace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file protspace-4.4.0-py3-none-any.whl.

File metadata

  • Download URL: protspace-4.4.0-py3-none-any.whl
  • Upload date:
  • Size: 708.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for protspace-4.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81b81eb4ea1a8521c1552f7e9a15fe4772134cb1050745ea8de20f2863a0f2e6
MD5 fe118c35426afad8c6e8c4975985fdd5
BLAKE2b-256 526a25e443f43a3d82982b130cc3c770b73434cc6d263de89ebf548f2e21f2ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for protspace-4.4.0-py3-none-any.whl:

Publisher: publish.yml on tsenoner/protspace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page