Ensemble package for all Tutte Institute librairies involved in data vectorization, dimension reduction, clustering and exploratory visualization
Project description
This package brings together the various libraries that the Tutte Institute has built towards exploratory analysis, unsupervised learning and interactive visualization for unstructured data. It includes the following individual packages.
Learn more at https://github.com/TutteInstitute
IMPORTANT: this package includes the libraries described below as dependencies without upper-bounding their versions. As such, newer versions of the package are mainly produced when editing the toolkit roster, and not necessarily when new versions of its dependencies are released. Thus, do not mistake the age of this package for dereliction.
Vector space embedding
vectorizers
Embeds various types of data into large-dimension vector spaces. This includes data that consists in distributions on vector spaces, which are embedded by the approximate resolution of optimal transport problems.
Nearest neighbour network discovery
pynndescent
Builds the k-nearest neighbour graph of a set of high-dimension vectors expressed as either dense or sparse arrays, under a large set of distances and pseudo-metrics. Doubles as an in-memory index for querying neighbours to arbitrary vectors.
Dimension reduction
umap (package name is umap-learn)
Uniform Manifold Approximation and Projection is a manifold learning dimension reduction algorithm that preserves the local similarity structure of a set of vectors. It works on both dense and sparse vector arrays.
Clustering
hdbscan
Hierarchical Density-Based Spatial Clustering of Applications with Noise. This clustering algorithm partitions a set of vectors into groups based on mutual reachability distance, discarding outliers as noise.
fast_hdbscan
A new implementation of HDBSCAN optimized for runtime efficiency by restricting computations to low-dimension vectors in Euclidean geometry.
evoc
Embedding Vector-Oriented Clustering is a new clustering algorithm that streamlines and approximates the UMAP-HDBSCAN combo approach to clustering, so as to compute high-quality clusterings of high-dimension vector sets at a fraction of the computational cost.
Interactive visualization
datamapplot
Creates static plots and interactive views of 2D vectors and metadata, with an emphasis on presentation aesthetics and interactive exploration for insight discovery.
toponymy
Generates a multiresolution hierarchy of annotation labels for text embeddings by querying a large language model with representative, distinctive and contrastive characterizations of data clusters. These labels are then useful for annotating data maps produced with datamapplot.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file timc_vector_toolkit-20250919.tar.gz.
File metadata
- Download URL: timc_vector_toolkit-20250919.tar.gz
- Upload date:
- Size: 2.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1827319f91fedcb4fb5970626d5e5bf5955d9947c017b08aca86c8c96e16839c
|
|
| MD5 |
d0300e0b8807979a9c56310efe489c67
|
|
| BLAKE2b-256 |
0f96a2ac99bf0bfd7cd1fd4f7f32e1201b19361787e2f401d66658e153c1866e
|
File details
Details for the file timc_vector_toolkit-20250919-py3-none-any.whl.
File metadata
- Download URL: timc_vector_toolkit-20250919-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1414cced385f9853265a157517723e4795ad182d82eda13d61f0c9b0b7c348f9
|
|
| MD5 |
5bb025a5e265d6006c8bf935ed2a87fc
|
|
| BLAKE2b-256 |
b21b69c76e894214dbfb7062bc59177530cef4f2804148e46809980ffbfead68
|