Skip to main content

Topological Hyperparameter Evaluation and Mapping Algorithm, by Krv Labs

Project description

THEMA ๐Ÿ”ฎ


By Krv Analytics.


Welcome to Thema, our Topological Hyperparameter Evaluation and Mapping Algorithm! ๐ŸŒŸ


Thema systematically explores hyperparameter spaces for unsupervised learning through topological data analysis. Instead of manually tuning preprocessing and embedding parameters, Thema generates candidate models systematically and uses curvature-based graph distances to identify diverse, high-quality representatives.

By leveraging advanced techniques to understand the distribution of representations that emerge from various preprocessing and hyperparameter choices, Thema brings a new level of insight to your unsupervised tasks. Navigate the complex terrain of hyperparameter optimization with confidence, identifying the most salient patterns and features in your data. ๐Ÿง ๐Ÿ”

Architecture

Thema operates through three distinct modules:

๐ŸŒ Multiverse - Core Data Processing Pipeline

The foundational system that transforms raw data into topological representations:

  • Planet (Preprocessing): Generates multiple clean data versions with different imputation, scaling, and encoding strategies
  • Oort (Embeddings): Creates low-dimensional projections across parameter grids (t-SNE, PCA)
  • Galaxy (Graph Construction): Builds Mapper graphs, computes topological distances, and selects representatives

๐Ÿš€ Expansion - Advanced Analytics Extensions

Specialized tools for extended analysis capabilities:

  • Realtor: Real estate and geographic data analysis tools
  • Utils: Utility functions for specialized data processing workflows

Installation

Install Thema using pip:

pip install thema

Verify the installation:

pip show thema

Quick Start

Get started with Thema in just a few lines of code! See params.yaml.sample as a template for defining your own representation grid search.

import thema
from thema import Thema

# Enable logging to see progress
thema.enable_logging()

# Initialize Thema with your configuration
my_thema = Thema(YAML_PATH='path/to/custom.yaml')

# Run the complete pipeline
my_thema.genesis()

# Access the selected representative model files
print(my_thema.selected_model_files)

That's it! Thema will systematically process your data through preprocessing, embedding, and graph construction stages, automatically selecting the most representative models.


Pipeline Components

Step 1: Preprocessing with Planet ๐ŸŒ

Clean, encode, and impute your raw data with multiple strategies:

from thema.multiverse import Planet

# Initialize Planet with your configuration
planet = Planet(YAML_PATH='path/to/params.yaml')

# Generate multiple cleaned datasets
planet.fit()

Planet creates various versions of your cleaned data with different:

  • Scaling methods (standard, minmax, robust)
  • Encoding strategies (one_hot, label, ordinal)
  • Imputation methods (mean, median, mode, sampleNormal)
  • Random seeds for reproducible sampling

Step 2: Embedding with Oort โ˜„๏ธ

Generate low-dimensional projections from your cleaned data:

from thema.multiverse import Oort

# Create embeddings across parameter grids
oort = Oort(YAML_PATH='path/to/params.yaml')
oort.fit()

Oort produces embeddings using:

  • t-SNE: With various perplexity values and dimensions
  • PCA: With different dimensionality settings
  • Multiple random seeds for robustness

Step 3: Graph Construction with Galaxy ๐ŸŒŒ

Build Mapper graphs and select representatives:

from thema.multiverse import Galaxy

# Generate graph models across hyperparameter space
galaxy = Galaxy(YAML_PATH='path/to/params.yaml')
galaxy.fit()

# Cluster and select representative models
representatives = galaxy.collapse()

Galaxy creates and analyzes:

  • Mapper graphs: Using various cover resolutions and overlap parameters
  • Topological distances: Computing curvature-based similarity metrics
  • Representative selection: Choosing diverse, high-quality models using clustering

Coordinate Space Generation

Generate a 2D embedding space of your models for analysis:

# Get 2D coordinates of all models in the galaxy
coordinates = galaxy.get_galaxy_coordinates()

# Access the selected representatives
for cluster_id, info in galaxy.selection.items():
    print(f"Cluster {cluster_id}: {info['star']} ({info['cluster_size']} models)")

Key Features

โœจ Systematic Exploration: Automatically explores preprocessing and embedding parameter combinations

๐ŸŽฏ Representative Selection: Uses topological distance metrics to identify diverse, high-quality models

๐Ÿ“Š Robust Analysis: Generates multiple models per configuration for statistical reliability

๐Ÿ”ง Flexible Configuration: YAML-based configuration for easy parameter management

๐Ÿš€ Parallel Processing: Efficient multiprocessing for large parameter grids

๐Ÿ“ˆ Topological Insights: Leverage graph topology and curvature for model comparison


Output Structure

Thema organizes outputs hierarchically:

{outDir}/{runName}/
โ”œโ”€โ”€ clean/                  # Preprocessed datasets (Moon files)
โ”‚   โ”œโ”€โ”€ moon_42_0.pkl
โ”‚   โ”œโ”€โ”€ moon_42_1.pkl
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ projections/           # Low-dimensional embeddings (Comet files)
โ”‚   โ”œโ”€โ”€ tsne_perplexity30_dims2_seed42_moon_42_0.pkl
โ”‚   โ”œโ”€โ”€ pca_dims2_seed42_moon_42_0.pkl
โ”‚   โ””โ”€โ”€ ...
โ””โ”€โ”€ models/               # Mapper graphs (Star files)
    โ”œโ”€โ”€ star_tsne_perplexity30_nCubes10_overlap0.6.pkl
    โ”œโ”€โ”€ star_pca_dims2_nCubes10_overlap0.6.pkl
    โ””โ”€โ”€ ...

When to Use Thema

โœ… Good Use Cases:

  • Exploring preprocessing choices for unsupervised learning
  • Comparing embedding methods systematically
  • Finding robust data representations across hyperparameter grids
  • Identifying diverse graph topologies in your data
  • Validating clustering stability across multiple configurations

โŒ Not Ideal For:

  • Supervised learning (Thema focuses on unsupervised tasks)
  • Single fixed preprocessing pipeline
  • Real-time inference (Thema generates models offline)

Documentation

For comprehensive guides and tutorials, visit our documentation.

Quick Links:


Transform the way you explore and interpret your data with Thema - where the topology of your analysis reveals the hidden stories in your data! ๐ŸŒ โœจ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thema-0.1.3.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thema-0.1.3-py3-none-any.whl (63.9 kB view details)

Uploaded Python 3

File details

Details for the file thema-0.1.3.tar.gz.

File metadata

  • Download URL: thema-0.1.3.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thema-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0fb1498c8d14b9388f137984af1a67392bc1e6cecc6d09098228ea95501db6a7
MD5 d17ea267d2941687aaa39e9649068d5a
BLAKE2b-256 5dd4d1ad1f6cc00d8b551387020b420c05a88caa8731521a033fa0227389ef21

See more details on using hashes here.

Provenance

The following attestation bundles were made for thema-0.1.3.tar.gz:

Publisher: publish.yaml on Krv-Analytics/Thema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thema-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: thema-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 63.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thema-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 050f87aeb40528a1839dea66a31837f9f9013d27c75bd4b59cce396c4268a98a
MD5 a63b19912356341349fa9a89d5d10d97
BLAKE2b-256 d99ae7f3715e238c1666f7650be5303c7dc578b1c2908a23a51671cdcd9500fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for thema-0.1.3-py3-none-any.whl:

Publisher: publish.yaml on Krv-Analytics/Thema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page