Topological Hyperparameter Evaluation and Mapping Algorithm, by Krv Labs
Project description
THEMA ๐ฎ
By Krv Analytics.
Welcome to Thema, our Topological Hyperparameter Evaluation and Mapping Algorithm! ๐
Thema systematically explores hyperparameter spaces for unsupervised learning through topological data analysis. Instead of manually tuning preprocessing and embedding parameters, Thema generates candidate models systematically and uses curvature-based graph distances to identify diverse, high-quality representatives.
By leveraging advanced techniques to understand the distribution of representations that emerge from various preprocessing and hyperparameter choices, Thema brings a new level of insight to your unsupervised tasks. Navigate the complex terrain of hyperparameter optimization with confidence, identifying the most salient patterns and features in your data. ๐ง ๐
Architecture
Thema operates through three distinct modules:
๐ Multiverse - Core Data Processing Pipeline
The foundational system that transforms raw data into topological representations:
- Planet (Preprocessing): Generates multiple clean data versions with different imputation, scaling, and encoding strategies
- Oort (Embeddings): Creates low-dimensional projections across parameter grids (t-SNE, PCA)
- Galaxy (Graph Construction): Builds Mapper graphs, computes topological distances, and selects representatives
๐ Expansion - Advanced Analytics Extensions
Specialized tools for extended analysis capabilities:
- Realtor: Real estate and geographic data analysis tools
- Utils: Utility functions for specialized data processing workflows
Installation
Install Thema using pip:
pip install thema
Verify the installation:
pip show thema
Quick Start
Get started with Thema in just a few lines of code! See params.yaml.sample as a template for defining your own representation grid search.
import thema
from thema import Thema
# Enable logging to see progress
thema.enable_logging()
# Initialize Thema with your configuration
my_thema = Thema(YAML_PATH='path/to/custom.yaml')
# Run the complete pipeline
my_thema.genesis()
# Access the selected representative model files
print(my_thema.selected_model_files)
That's it! Thema will systematically process your data through preprocessing, embedding, and graph construction stages, automatically selecting the most representative models.
Pipeline Components
Step 1: Preprocessing with Planet ๐
Clean, encode, and impute your raw data with multiple strategies:
from thema.multiverse import Planet
# Initialize Planet with your configuration
planet = Planet(YAML_PATH='path/to/params.yaml')
# Generate multiple cleaned datasets
planet.fit()
Planet creates various versions of your cleaned data with different:
- Scaling methods (
standard,minmax,robust) - Encoding strategies (
one_hot,label,ordinal) - Imputation methods (
mean,median,mode,sampleNormal) - Random seeds for reproducible sampling
Step 2: Embedding with Oort โ๏ธ
Generate low-dimensional projections from your cleaned data:
from thema.multiverse import Oort
# Create embeddings across parameter grids
oort = Oort(YAML_PATH='path/to/params.yaml')
oort.fit()
Oort produces embeddings using:
- t-SNE: With various perplexity values and dimensions
- PCA: With different dimensionality settings
- Multiple random seeds for robustness
Step 3: Graph Construction with Galaxy ๐
Build Mapper graphs and select representatives:
from thema.multiverse import Galaxy
# Generate graph models across hyperparameter space
galaxy = Galaxy(YAML_PATH='path/to/params.yaml')
galaxy.fit()
# Cluster and select representative models
representatives = galaxy.collapse()
Galaxy creates and analyzes:
- Mapper graphs: Using various cover resolutions and overlap parameters
- Topological distances: Computing curvature-based similarity metrics
- Representative selection: Choosing diverse, high-quality models using clustering
Coordinate Space Generation
Generate a 2D embedding space of your models for analysis:
# Get 2D coordinates of all models in the galaxy
coordinates = galaxy.get_galaxy_coordinates()
# Access the selected representatives
for cluster_id, info in galaxy.selection.items():
print(f"Cluster {cluster_id}: {info['star']} ({info['cluster_size']} models)")
Key Features
โจ Systematic Exploration: Automatically explores preprocessing and embedding parameter combinations
๐ฏ Representative Selection: Uses topological distance metrics to identify diverse, high-quality models
๐ Robust Analysis: Generates multiple models per configuration for statistical reliability
๐ง Flexible Configuration: YAML-based configuration for easy parameter management
๐ Parallel Processing: Efficient multiprocessing for large parameter grids
๐ Topological Insights: Leverage graph topology and curvature for model comparison
Output Structure
Thema organizes outputs hierarchically:
{outDir}/{runName}/
โโโ clean/ # Preprocessed datasets (Moon files)
โ โโโ moon_42_0.pkl
โ โโโ moon_42_1.pkl
โ โโโ ...
โโโ projections/ # Low-dimensional embeddings (Comet files)
โ โโโ tsne_perplexity30_dims2_seed42_moon_42_0.pkl
โ โโโ pca_dims2_seed42_moon_42_0.pkl
โ โโโ ...
โโโ models/ # Mapper graphs (Star files)
โโโ star_tsne_perplexity30_nCubes10_overlap0.6.pkl
โโโ star_pca_dims2_nCubes10_overlap0.6.pkl
โโโ ...
When to Use Thema
โ Good Use Cases:
- Exploring preprocessing choices for unsupervised learning
- Comparing embedding methods systematically
- Finding robust data representations across hyperparameter grids
- Identifying diverse graph topologies in your data
- Validating clustering stability across multiple configurations
โ Not Ideal For:
- Supervised learning (Thema focuses on unsupervised tasks)
- Single fixed preprocessing pipeline
- Real-time inference (Thema generates models offline)
Documentation
For comprehensive guides and tutorials, visit our documentation.
Quick Links:
Transform the way you explore and interpret your data with Thema - where the topology of your analysis reveals the hidden stories in your data! ๐ โจ
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thema-0.1.3.tar.gz.
File metadata
- Download URL: thema-0.1.3.tar.gz
- Upload date:
- Size: 55.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fb1498c8d14b9388f137984af1a67392bc1e6cecc6d09098228ea95501db6a7
|
|
| MD5 |
d17ea267d2941687aaa39e9649068d5a
|
|
| BLAKE2b-256 |
5dd4d1ad1f6cc00d8b551387020b420c05a88caa8731521a033fa0227389ef21
|
Provenance
The following attestation bundles were made for thema-0.1.3.tar.gz:
Publisher:
publish.yaml on Krv-Analytics/Thema
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thema-0.1.3.tar.gz -
Subject digest:
0fb1498c8d14b9388f137984af1a67392bc1e6cecc6d09098228ea95501db6a7 - Sigstore transparency entry: 632981404
- Sigstore integration time:
-
Permalink:
Krv-Analytics/Thema@3513b8f232f3f35adf83cdaf2ae7d43aa4c3716d -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/Krv-Analytics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@3513b8f232f3f35adf83cdaf2ae7d43aa4c3716d -
Trigger Event:
push
-
Statement type:
File details
Details for the file thema-0.1.3-py3-none-any.whl.
File metadata
- Download URL: thema-0.1.3-py3-none-any.whl
- Upload date:
- Size: 63.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
050f87aeb40528a1839dea66a31837f9f9013d27c75bd4b59cce396c4268a98a
|
|
| MD5 |
a63b19912356341349fa9a89d5d10d97
|
|
| BLAKE2b-256 |
d99ae7f3715e238c1666f7650be5303c7dc578b1c2908a23a51671cdcd9500fb
|
Provenance
The following attestation bundles were made for thema-0.1.3-py3-none-any.whl:
Publisher:
publish.yaml on Krv-Analytics/Thema
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thema-0.1.3-py3-none-any.whl -
Subject digest:
050f87aeb40528a1839dea66a31837f9f9013d27c75bd4b59cce396c4268a98a - Sigstore transparency entry: 632981408
- Sigstore integration time:
-
Permalink:
Krv-Analytics/Thema@3513b8f232f3f35adf83cdaf2ae7d43aa4c3716d -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/Krv-Analytics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@3513b8f232f3f35adf83cdaf2ae7d43aa4c3716d -
Trigger Event:
push
-
Statement type: