Skip to main content

Ensemble package for all Tutte Institute librairies involved in data vectorization, dimension reduction, clustering and exploratory visualization

Project description

This package brings together the various libraries that the Tutte Institute has built towards exploratory analysis, unsupervised learning and interactive visualization for unstructured data. It includes the following individual packages.

Learn more at https://github.com/TutteInstitute

IMPORTANT: this package includes the libraries described below as dependencies without upper-bounding their versions. As such, newer versions of the package are mainly produced when editing the toolkit roster, and not necessarily when new versions of its dependencies are released. Thus, do not mistake the age of this package for dereliction.


Vector space embedding

vectorizers

Embeds various types of data into large-dimension vector spaces. This includes data that consists in distributions on vector spaces, which are embedded by the approximate resolution of optimal transport problems.


Nearest neighbour network discovery

pynndescent

Builds the k-nearest neighbour graph of a set of high-dimension vectors expressed as either dense or sparse arrays, under a large set of distances and pseudo-metrics. Doubles as an in-memory index for querying neighbours to arbitrary vectors.


Dimension reduction

umap (package name is umap-learn)

Uniform Manifold Approximation and Projection is a manifold learning dimension reduction algorithm that preserves the local similarity structure of a set of vectors. It works on both dense and sparse vector arrays.


Clustering

hdbscan

Hierarchical Density-Based Spatial Clustering of Applications with Noise. This clustering algorithm partitions a set of vectors into groups based on mutual reachability distance, discarding outliers as noise.

fast_hdbscan

A new implementation of HDBSCAN optimized for runtime efficiency by restricting computations to low-dimension vectors in Euclidean geometry.

evoc

Embedding Vector-Oriented Clustering is a new clustering algorithm that streamlines and approximates the UMAP-HDBSCAN combo approach to clustering, so as to compute high-quality clusterings of high-dimension vector sets at a fraction of the computational cost.


Interactive visualization

datamapplot

Creates static plots and interactive views of 2D vectors and metadata, with an emphasis on presentation aesthetics and interactive exploration for insight discovery.

toponymy

Generates a multiresolution hierarchy of annotation labels for text embeddings by querying a large language model with representative, distinctive and contrastive characterizations of data clusters. These labels are then useful for annotating data maps produced with datamapplot.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timc_vector_toolkit-20250919.tar.gz (2.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timc_vector_toolkit-20250919-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file timc_vector_toolkit-20250919.tar.gz.

File metadata

File hashes

Hashes for timc_vector_toolkit-20250919.tar.gz
Algorithm Hash digest
SHA256 1827319f91fedcb4fb5970626d5e5bf5955d9947c017b08aca86c8c96e16839c
MD5 d0300e0b8807979a9c56310efe489c67
BLAKE2b-256 0f96a2ac99bf0bfd7cd1fd4f7f32e1201b19361787e2f401d66658e153c1866e

See more details on using hashes here.

File details

Details for the file timc_vector_toolkit-20250919-py3-none-any.whl.

File metadata

File hashes

Hashes for timc_vector_toolkit-20250919-py3-none-any.whl
Algorithm Hash digest
SHA256 1414cced385f9853265a157517723e4795ad182d82eda13d61f0c9b0b7c348f9
MD5 5bb025a5e265d6006c8bf935ed2a87fc
BLAKE2b-256 b21b69c76e894214dbfb7062bc59177530cef4f2804148e46809980ffbfead68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page