Skip to main content

Tree-based visualization for high-dimensional data

Project description

Tests PyPI Python 3.11+

TMAP2

Tree-based visualization for high-dimensional data. Organizes similar items into interactive tree structures. Ideal for chemical space, protein embeddings, single-cell data, or any high-dimensional dataset.

Interactive HTML export AlphaFold protein clusters

Why Trees?

Most dimensionality reduction tools (UMAP, t-SNE) produce point clouds. TMAP produces a tree, a connected structure where every point is linked to its neighbors through branches. This makes the layout itself explorable: you can follow branches, trace paths between any two points, and discover how regions connect.

For example, in a TMAP of pet breed images, following the branch from terriers toward cats reveals that the bridge between the two groups runs through chihuahuas and sphynx cats (the bald ones) which is both hilarious and logical; both are small, short-haired, big-eyed. The tree doesn't just cluster similar things it also shows you how dissimilar things are connected.

Exploring pet breed tree

Because the layout is a tree, you get operations that point clouds can't support:

path = model.path(idx_a, idx_b) # nodes along the tree path
d = model.distance(idx_a, idx_b # sum of edge weights along the path
pseudotime = model.distances_from(idx) # tree distance from one point to all others

Installation

pip install tmap2

Optional extras:

pip install rdkit # chemistry helpers (fingerprints_from_smiles, molecular_properties)
pip install jupyter-scatter # notebook interactive widgets

Note: The import name is tmap, not tmap2.

Quick Start

import numpy as np
from tmap import TMAP

# Binary fingerprints (Jaccard)
X = np.random.randint(0, 2, (1000, 2048), dtype=np.uint8)
model = TMAP(metric="jaccard", n_neighbors=20, seed=42).fit(X)
model.to_html("map.html")
# Dense embeddings (cosine / euclidean)
X = np.random.random((1000, 128)).astype(np.float32)
model = TMAP(metric="cosine", n_neighbors=20).fit(X)
new_coords = model.transform(X[:10])
# Interactive notebook widget
model.plot(color_by="label", data=df, tooltip_properties=["name", "score"])

Key Features

  • Tree structure: follow branches, trace paths, compute pseudotime
  • Deterministic: same input + seed = same output
  • Multiple metrics: jaccard, cosine, euclidean, precomputed
  • Incremental: add_points() and transform() for new data
  • Model persistence: save() / load()
  • Three viz backends: interactive HTML, jupyter-scatter, matplotlib

Visualization

Interactive HTML: lasso selection, light/dark theme, filter and search panels, pinned metadata cards, binary mode for large datasets.

Notebook widgets: color switching, categorical filtering, and lasso selection with pandas-backed metadata:

viz = model.to_tmapviz()
viz.add_color_layout("Molecular Weight", mw.tolist(), categorical=False)
viz.add_color_layout("Scaffold", scaffolds, categorical=True, color="tab10")
viz.add_label("SMILES", smiles_list)
viz.show(width=1000, height=620, controls=True)

Static plots — matplotlib for publication figures: model.plot_static(color_by=labels)

Domain Utilities

Built-in helpers for common scientific workflows:

from tmap.utils.chemistry import fingerprints_from_smiles, molecular_properties
from tmap.utils.proteins import fetch_uniprot, sequence_properties
from tmap.utils.singlecell import from_anndata
Domain Metric Utilities
Chemoinformatics jaccard fingerprints_from_smiles, molecular_properties, murcko_scaffolds
Proteins cosine / euclidean fetch_uniprot, fetch_alphafold, read_fasta, sequence_properties
Single-cell cosine / euclidean from_anndata, cell_metadata, marker_scores
Generic embeddings cosine / euclidean / precomputed No domain utils needed

Notebooks

Notebook Topic
01 Quick Start End-to-end walkthrough
02 MinHash Deep Dive Encoding methods and when to use each
03 Legacy LSH Pipeline Lower-level MinHash + LSHForest + layout workflow
04 Notebook Widgets Selection, filtering, zoom, export
05 Single-Cell RNA-seq with PBMC 3k, pseudotime, UMAP comparison
06 Metric Guide Choosing the right metric
07 FAQ Troubleshooting and common questions
08 Cheminformatics Molecules, fingerprints, SAR
09 Protein Analysis FASTA, ESM embeddings, AlphaFold
11 Card Configuration Pinned card layout, fields, and links
11 Default Params Benchmark Defaults across dataset sizes and types
12 USearch Jaccard Binary Jaccard with USearch backend

Lower-Level Pipeline

For direct control over indexing, hashing, and layout, see the legacy pipeline notebook. The main building blocks:

from tmap.index import USearchIndex           # dense / binary kNN
from tmap import MinHash, LSHForest           # Jaccard on sets / strings
from tmap.layout import LayoutConfig, layout_from_lsh_forest
Your Data
   ├─→ Binary matrix ─────────→ USearch        (Jaccard / cosine / euclidean)
   └─→ Sets / strings ───────→ MinHash → LSHForest
                ↓
             k-NN Graph → MST → OGDF Tree Layout → Interactive Visualization

Development

git clone https://github.com/afloresep/tmap2.git
cd tmap2
pip install ".[dev]"
pytest -v

License

MIT License - see LICENSE for details.

Based on the original TMAP by Daniel Probst and Jean-Louis Reymond.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tmap2-0.2.0.tar.gz (4.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tmap2-0.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

tmap2-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (910.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tmap2-0.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

tmap2-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (909.9 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tmap2-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

tmap2-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (909.5 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file tmap2-0.2.0.tar.gz.

File metadata

  • Download URL: tmap2-0.2.0.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tmap2-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2e2f006d66be9d546799b36085131c73a6c6957e4fee465a27fec15131227140
MD5 7d208de15fd1dc1a760f3cc9eed46eba
BLAKE2b-256 acc1ec3c7738accbf4a0273428c7c23182e9ffaa74b743d8798a0034ea3dcc23

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0.tar.gz:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4bad90ab13d09f0066f42fedea85ed4150218c1134e800a628fb4555502baaf6
MD5 4f1ee486ce21116315e5ad1429080207
BLAKE2b-256 7bdaac924a6bc601ded35e9bc4b9746cea821eee44a43053c49867044b975851

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 79ede702abd2964ae966311686fe6736dfcd71f71f73b92fafc0f52ab48b875e
MD5 a60b9539baf4ac56193e60a60c14fe93
BLAKE2b-256 1d53195c22d2a0985c50da4dc42401df22fe944ac8586f46b00b6bc5a56eb55a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0533af3d11eef77df58f0025c28e4ef2cea57b4757c6b0a57200ac392187b06f
MD5 da8ef496307c4038365cb92eef8e03f1
BLAKE2b-256 5261d2e21dbd703d26daffa124c21333ec147a08c5fdcab6f23a2dfc13dba5d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ed03fd77ef6e8c14101a3244bd8d6584b1eaf2574cdfe750355fd34bce29daaf
MD5 3eaf803d3202fb67c3be43c70401df14
BLAKE2b-256 93272215d65a9106d08ba04c2c7b7c7b084c584eabdc9de389560b026edee5e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7483d1b7f79e3598096d5ea530c1971aa38fca98e730d290ca4ceb545ee2fa26
MD5 72d009881780d60df6c497dbfbbe7c75
BLAKE2b-256 a85fdd4ef9cc918defce017e563065029b8cf03a2bec35a2e4b716fbbe50efc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tmap2-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tmap2-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1c46083235bef76c288a24a508cd6b5d2148451f35ad4c968d944535309671d5
MD5 b8693a706e4f5695f7566c7b50eaf094
BLAKE2b-256 f1bdf322b108f50e39c88f9736a69305aa4b746f2812cc42e14837e138979d79

See more details on using hashes here.

Provenance

The following attestation bundles were made for tmap2-0.2.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on afloresep/tmap2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page