Tune machine learning models for empirical identifiability and consistency
Project description
🐟iTuna
Tune machine learning models for empirical identifiability and consistency
Why 🐟iTuna?
Applying machine learning to scientific data analysis often suffers from an identifiability gap: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.
Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need empirical validation to confirm they behave consistently in practice.
🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:
- Train multiple instances of a model with different random seeds
- Align their embeddings under the appropriate indeterminacy class
- Measure how consistent the learned representations are
Think of it as a unit test for reproducibility of learned embeddings.
Features
- sklearn-compatible: Works with any transformer implementing
fit,transform, and standard sklearn conventions - Built-in indeterminacy classes:
Identity- no transformation needed (model is already fully identifiable)Permutation- handles sign flips and component reordering (e.g., FastICA)Linear- linear transformation alignment (e.g., PCA)Affine- linear transformation with intercept (e.g., CEBRA)
- Consistency scoring: Quantifies how stable embeddings are across runs
- Embedding alignment: Returns aligned embeddings for downstream analysis
- Flexible backends: In-memory, disk caching, distributed execution, and DataJoint support
Installation
pip install git+https://github.com/dynamical-inference/ituna.git
Optional extras:
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]" # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]" # Development dependencies (pytest, etc.)
Quickstart
import numpy as np
from sklearn.decomposition import FastICA
from ituna import ConsistencyEnsemble, metrics
# Generate sample data
X = np.random.randn(1000, 64)
# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
estimator=FastICA(n_components=16, max_iter=500),
consistency_transform=metrics.PairwiseConsistency(
indeterminacy=metrics.Permutation(), # FastICA is identifiable up to permutation
symmetric=False,
include_diagonal=True,
),
random_states=5, # Train 5 instances with different seeds
)
# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))
# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)
Documentation
Full documentation is available at dynamical-inference.github.io/ituna.
- Quickstart notebook:
docs/tutorials/quickstart.ipynb- minimal working example - Core concepts:
docs/tutorials/core.ipynb- in-depth walkthrough - Backends:
docs/tutorials/backends.ipynb- caching and distributed execution
Backends
🐟iTuna supports different backends for caching and distributed computation:
from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA
ensemble = ConsistencyEnsemble(
estimator=FastICA(n_components=16, max_iter=500),
consistency_transform=metrics.PairwiseConsistency(
indeterminacy=metrics.Permutation(),
),
random_states=10,
)
# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
ensemble.fit(X)
# Distributed execution with multiple workers
with config.config_context(
DEFAULT_BACKEND="disk_cache_distributed",
BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
ensemble.fit(X)
CLI Commands
For large-scale experiments, use the command-line tools:
# Local distributed backend
ituna-fit-distributed --sweep-name <sweep-uuid> --cache-dir ./cache
# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name <sweep-uuid> --schema-name myschema
Development
# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]
# Run tests
pytest tests -v
# Setup pre-commit hooks
pre-commit install
For the full development guide — branching conventions, code style, building docs, and the release process — see CONTRIBUTING.md.
Citation
If you use 🐟iTuna in your research, please cite:
@software{ituna,
author = {Schmidt, Tobias and Schneider, Steffen},
title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
url = {https://github.com/dynamical-inference/ituna},
version = {0.1.0},
}
License
🐟iTuna is released under the MIT License. If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the LICENSE file into a NOTICE in your repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ituna-0.1.0.tar.gz.
File metadata
- Download URL: ituna-0.1.0.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b6c4ea5ad7cbb2cd090ed7449fcd0b68f310be9224321dec9de68799afd34c7
|
|
| MD5 |
9112826334432bb3eef61d2857f7a37c
|
|
| BLAKE2b-256 |
0f12960a5a38b3022d4fc33536d3b7ac4a57a756a23eeb96ae62ce66517a431b
|
Provenance
The following attestation bundles were made for ituna-0.1.0.tar.gz:
Publisher:
publish.yml on dynamical-inference/ituna
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ituna-0.1.0.tar.gz -
Subject digest:
8b6c4ea5ad7cbb2cd090ed7449fcd0b68f310be9224321dec9de68799afd34c7 - Sigstore transparency entry: 927059777
- Sigstore integration time:
-
Permalink:
dynamical-inference/ituna@01e5b87f41c2c26aa18ceb655301f29e602de530 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dynamical-inference
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@01e5b87f41c2c26aa18ceb655301f29e602de530 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ituna-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ituna-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12c8385030b7919d47d7f1421afa2667c54394f40886b61f696b81637ec97a6a
|
|
| MD5 |
1ad7414abd5fc2d998cfe0d7b4dac9f4
|
|
| BLAKE2b-256 |
98ecbce3af66e95fd8b7e2cebc015ebdd676179aa727ba2a9d08a8e64c7998ba
|
Provenance
The following attestation bundles were made for ituna-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on dynamical-inference/ituna
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ituna-0.1.0-py3-none-any.whl -
Subject digest:
12c8385030b7919d47d7f1421afa2667c54394f40886b61f696b81637ec97a6a - Sigstore transparency entry: 927059778
- Sigstore integration time:
-
Permalink:
dynamical-inference/ituna@01e5b87f41c2c26aa18ceb655301f29e602de530 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/dynamical-inference
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@01e5b87f41c2c26aa18ceb655301f29e602de530 -
Trigger Event:
push
-
Statement type: