Skip to main content

Ensemble package for all Tutte Institute librairies involved in data vectorization, dimension reduction, clustering and exploratory visualization

Project description

This package brings together the various libraries that the Tutte Institute has built towards exploratory analysis, unsupervised learning and interactive visualization for unstructured data. It includes the following individual packages.

Learn more at https://github.com/TutteInstitute

IMPORTANT: this package includes the libraries described below as dependencies without upper-bounding their versions. As such, newer versions of the package are mainly produced when editing the toolkit roster, and not necessarily when new versions of its dependencies are released. Thus, do not mistake the age of this package for dereliction.


Vector space embedding

vectorizers

Embeds various types of data into large-dimension vector spaces. This includes data that consists in distributions on vector spaces, which are embedded by the approximate resolution of optimal transport problems.


Nearest neighbour network discovery

pynndescent

Builds the k-nearest neighbour graph of a set of high-dimension vectors expressed as either dense or sparse arrays, under a large set of distances and pseudo-metrics. Doubles as an in-memory index for querying neighbours to arbitrary vectors.


Dimension reduction

umap (package name is umap-learn)

Uniform Manifold Approximation and Projection is a manifold learning dimension reduction algorithm that preserves the local similarity structure of a set of vectors. It works on both dense and sparse vector arrays.


Clustering

hdbscan

Hierarchical Density-Based Spatial Clustering of Applications with Noise. This clustering algorithm partitions a set of vectors into groups based on mutual reachability distance, discarding outliers as noise.

fast_hdbscan

A new implementation of HDBSCAN optimized for runtime efficiency by restricting computations to low-dimension vectors in Euclidean geometry.

evoc

Embedding Vector-Oriented Clustering is a new clustering algorithm that streamlines and approximates the UMAP-HDBSCAN combo approach to clustering, so as to compute high-quality clusterings of high-dimension vector sets at a fraction of the computational cost.


Interactive visualization

datamapplot

Creates static plots and interactive views of 2D vectors and metadata, with an emphasis on presentation aesthetics and interactive exploration for insight discovery.

toponymy

Generates a multiresolution hierarchy of annotation labels for text embeddings by querying a large language model with representative, distinctive and contrastive characterizations of data clusters. These labels are then useful for annotating data maps produced with datamapplot.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timc_vector_toolkit-20260611.tar.gz (2.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timc_vector_toolkit-20260611-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file timc_vector_toolkit-20260611.tar.gz.

File metadata

  • Download URL: timc_vector_toolkit-20260611.tar.gz
  • Upload date:
  • Size: 2.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for timc_vector_toolkit-20260611.tar.gz
Algorithm Hash digest
SHA256 f69911ea61ee085d39d493526a242d2d03b12b95110cc04cb14705d24f83976e
MD5 9525cd6f0672998fafbcfb92c00fd467
BLAKE2b-256 f7841258e376f9df7acb68a17141d252db5ea52733b4264e537e2f849a47794b

See more details on using hashes here.

File details

Details for the file timc_vector_toolkit-20260611-py3-none-any.whl.

File metadata

  • Download URL: timc_vector_toolkit-20260611-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for timc_vector_toolkit-20260611-py3-none-any.whl
Algorithm Hash digest
SHA256 7245f6b76f3bcd7381c1a4e7a111d08b10667d87f8e11389ca3bc1a256f8f917
MD5 70f37880bb50a760b777f9880bed6292
BLAKE2b-256 61d1ebb6f17d512a8345810877fe6b7d6bb0ed5fcadb885c74ff3b51f2c09e0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page