Skip to main content

Tree-based visualization for high-dimensional data

Project description

Tests PyPI Python 3.11+

TMAP2

Tree-based visualization for high-dimensional data. Organizes similar items into interactive tree structures. Ideal for chemical space, protein embeddings, single-cell data, or any high-dimensional dataset.

Interactive HTML export AlphaFold protein clusters

Why Trees?

Most dimensionality reduction tools (UMAP, t-SNE) produce point clouds. TMAP produces a tree, a connected structure where every point is linked to its neighbors through branches. This makes the layout itself explorable: you can follow branches, trace paths between any two points, and discover how regions connect.

For example, in a TMAP of pet breed images, following the branch from terriers toward cats reveals that the bridge between the two groups runs through chihuahuas and sphynx cats (the bald ones) which is both hilarious and logical; both are small, short-haired, big-eyed. The tree doesn't just cluster similar things it also shows you how dissimilar things are connected.

Exploring pet breed tree

Because the layout is a tree, you get operations that point clouds can't support:

path = model.path(idx_a, idx_b)        # nodes along the tree path
d = model.distance(idx_a, idx_b)        # sum of edge weights along the path
pseudotime = model.distances_from(idx)  # tree distance from one point to all others

Installation

pip install tmap2

Optional extras:

pip install rdkit # chemistry helpers (fingerprints_from_smiles, molecular_properties)
pip install jupyter-scatter # notebook interactive widgets
pip install biopython # protein helpers (ProtParam properties, PDB parsing)

Note: The import name is tmap, not tmap2.

Quick Start

Binary Data (e.g. Chemical Fingerprints)

from tmap.utils import fingerprints_from_smiles
from tmap import TMAP

smiles = [...] # Your smiles list
# Get Binary fingerprints (Need Jaccard distance)
fps = fingerprints_from_smiles(smiles, fp_type="morgan", radius=2, n_bits=2048)
model = TMAP(metric="jaccard", n_neighbors=20, seed=42).fit(fps)
model.write_html("map.html") # Save in html file
# model.show() # See in Jupyter Notebook 

Continuous Vectors (e.g. Protein Embeddings)

# embeddings (use cosine / euclidean distances)
X = np.random.random((1000, 128)).astype(np.float32)
model = TMAP(metric="cosine", n_neighbors=20).fit(X)
# model.write_html("tmap.html") # Save in html file
model.show() # See in Jupyter Notebook 

Key Features

  • Tree structure: follow branches, trace paths, compute pseudotime
  • Deterministic: same input + seed = same output
  • Multiple metrics: jaccard, cosine, euclidean, precomputed
  • Incremental: add_points() and transform() for adding new data into an existing TMAP
  • Model persistence: save() / load()
  • Three viz backends: interactive HTML, jupyter-scatter, matplotlib

Visualization (add colors, labels...)

Notebook widgets: color switching, categorical filtering, and lasso selection with pandas-backed metadata:

Add Colors & Labels

Adding colors is quite simple. Just pass the name of the layout (e.g. Molecular Weight, Age, Protein Lenght ...), a list of values for each node and matplotlib color. If the data is categorical (e.g. Age or Heavy Atom Count) pass categorical=True so that categorical colors like tab10 become available. To add labels (i.e. data that is not needed for coloring the nodes) just pass a name for the labels and the list of values.

model = TMAP(metric="jaccard").fit(X)
viz = model.to_tmapviz() 
viz.add_color_layout("Molecular Weight", mw.tolist(), categorical=False) 
viz.add_color_layout("Scaffold", scaffolds, categorical=True, color="tab10")
viz.add_label("SMILES", smiles_list)
viz.show(width=1000, height=620, controls=True) # to see in jupyter notebook
# viz.write_html("mytmap.html") # to save and see as HTML in the browser

Here SMILES are added as label which will not trigger the 2D image of the structure. If you want to see the structures add smiles via add_smiles(smiles_list)

If you save using viz.write_html("name.html") the Interactive HTML becomes available which supports lasso selection, light/dark theme, filter and search panels, pinned metadata cards, binary mode for large datasets.

Alternatively, you can see it with matplotlib by using Static plots matplotlib for publication figures: model.plot_static(color_by=labels)

Domain Utilities

Built-in helpers for common scientific workflows:

from tmap.utils.chemistry import fingerprints_from_smiles, molecular_properties
from tmap.utils.proteins import fetch_uniprot, sequence_properties
from tmap.utils.singlecell import from_anndata
Domain Metric Utilities
Chemoinformatics jaccard fingerprints_from_smiles, molecular_properties, murcko_scaffolds
Proteins cosine / euclidean fetch_uniprot, fetch_alphafold, read_fasta, sequence_properties
Single-cell cosine / euclidean from_anndata, cell_metadata, marker_scores
Generic embeddings cosine / euclidean / precomputed No domain utils needed

Notebooks

Notebook Topic
01 Quickstart Shortest end-to-end walkthrough on a small molecule table
02 Cheminformatics SMILES → fingerprints → interactive molecular map
03 Continuous Embeddings Cosine and euclidean on MNIST: when to use each
04 What's New add_points, transform, tree paths, save/load, external kNN
05 Single-Cell RNA-seq with PBMC 3k, pseudotime, UMAP comparison
06 FAQ Troubleshooting and common questions
07 MinHash Deep Dive Encoding methods and when to use each
08 Notebook Widgets Coloring, tooltips, lasso selection with jupyter-scatter
09 Card Configuration Pinned card layout, fields, and links
10 Protein Analysis FASTA, ESM embeddings, AlphaFold
11 USearch Jaccard Native binary Jaccard backend (high recall, low memory)
12 Legacy LSH Pipeline Lower-level MinHash + LSHForest + layout workflow

Lower-Level Pipeline

For direct control over indexing, hashing, and layout, see the legacy pipeline notebook. The main building blocks:

from tmap.index import USearchIndex           # dense / binary kNN
from tmap import MinHash, LSHForest           # Jaccard on sets / strings
from tmap.layout import LayoutConfig, layout_from_lsh_forest
Your Data
   ├─→ Binary matrix ─────────→ USearch        (Jaccard / cosine / euclidean)
   └─→ Sets / strings ───────→ MinHash → LSHForest
                ↓
             k-NN Graph → MST → OGDF Tree Layout → Interactive Visualization

Development

git clone https://github.com/afloresep/tmap2.git
cd tmap2
pip install ".[dev]"
pytest -v

License

MIT License - see LICENSE for details.

Based on the original TMAP by Daniel Probst and Jean-Louis Reymond.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmap2-0.2.2.tar.gz (4.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tmap2-0.2.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

tmap2-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (915.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tmap2-0.2.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

tmap2-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (914.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tmap2-0.2.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

tmap2-0.2.2-cp311-cp311-macosx_11_0_arm64.whl (914.6 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file tmap2-0.2.2.tar.gz.

File metadata

  • Download URL: tmap2-0.2.2.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tmap2-0.2.2.tar.gz
Algorithm Hash digest
SHA256 cc05bf5c8c7c92405c3f215209b4680dd716585c20b7e750ea3f4131476412ad
MD5 a8da2e1eef6ed8adf03689e655738eab
BLAKE2b-256 b78efd5fad75cc3192a918580ed49579a50dcaaeb8d5c8293714f8f4623effff

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2.tar.gz:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 542d0cb93ef0f5380ea52ead158f99144e823d9252e0fbfcd60b33e08d668f89
MD5 c1157cfad873cf3fd5a720222c11ea9c
BLAKE2b-256 331ffcbccaee6015305e392964c1f9908d8c4e9e4a141db3d18b705c697e68ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0dc588ad1b7bb99160858af7c2e6587a5a3cda1ce43412c21f2ed06c860541ee
MD5 4b0e19efb6cd37b2223ac62173e59b09
BLAKE2b-256 ed179e233fcbfd3f4df97f45cc6cea52e8fd1fd17fb89d884487438c4599f567

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6d2983cffb7092d161c86b80aa1912a2ba9e5b33fb7e15c66e613b477b472656
MD5 1cdf82e17931a782ca65017402a2e2aa
BLAKE2b-256 57f205546cf491d39b908e54aab5bd9a9ae2e1e0cd3b21e6fd033e982879b065

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 35686fe3e08b7b5e7500a6202e02e924b9ffd0de25ca545e94b1f4caf0c0990f
MD5 e90a0d24d1e78c8d892bdafa8a285cc5
BLAKE2b-256 5a28ec1bedf73828b1341a95e7549b7d51b4b27b84fdf5b6c320ab32a11114f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 48abe8cf8d140e90c4b4db4f75968848a76c617a15360b1d2d6dbae8fac886ad
MD5 feec2ad55fc8e6e6cc2772eb9fb2970a
BLAKE2b-256 45919b620b1ca278b1d1b29472dbdabbf581e892b81871dc73d620bb39f17265

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5bd0bab2088c5defafbe8bc3973c327609fe9fe4353b2a0f6ee2cd255f589c60
MD5 b8372e7dc70267bba9dc0c415ee522f0
BLAKE2b-256 942fc4f9d3b627ac9dc1a0a30ce3db067fb389cebc0c34b1c64ba44f5dabfd6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page