Skip to main content

Tune machine learning models for empirical identifiability and consistency

Project description

🐟iTuna

iTuna Documentation PyPI version Python versions License Build

Tune machine learning models for empirical identifiability and consistency

Why 🐟iTuna?

Applying machine learning to scientific data analysis often suffers from an identifiability gap: many models along the data-to-analysis pipeline lack statistical guarantees about the uniqueness of their learned representations. This means that re-running the same algorithm can yield different embeddings, making downstream interpretation unreliable without manual verification.

Identifiable representation learning addresses this by ensuring models recover representations that are unique up to a known class of transformations (permutation, linear, affine, etc.). However, even theoretically identifiable models need empirical validation to confirm they behave consistently in practice.

🐟iTuna closes this gap by providing a lightweight, model-agnostic framework to:

  1. Train multiple instances of a model with different random seeds
  2. Align their embeddings under the appropriate indeterminacy class
  3. Measure how consistent the learned representations are

Think of it as a unit test for reproducibility of learned embeddings.

Features

  • sklearn-compatible: Works with any transformer implementing fit, transform, and standard sklearn conventions
  • Built-in indeterminacy classes:
    • Identity - no transformation needed (model is already fully identifiable)
    • Permutation - handles sign flips and component reordering (e.g., FastICA)
    • Linear - linear transformation alignment (e.g., PCA)
    • Affine - linear transformation with intercept (e.g., CEBRA)
  • Consistency scoring: Quantifies how stable embeddings are across runs
  • Embedding alignment: Returns aligned embeddings for downstream analysis
  • Flexible backends: In-memory, disk caching, distributed execution, and DataJoint support

Installation

pip install git+https://github.com/dynamical-inference/ituna.git

Optional extras:

pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[datajoint]"  # DataJoint backend for database-backed caching
pip install "git+https://github.com/dynamical-inference/ituna.git#egg=ituna[dev]"        # Development dependencies (pytest, etc.)

Quickstart

import numpy as np
from sklearn.decomposition import FastICA

from ituna import ConsistencyEnsemble, metrics

# Generate sample data
X = np.random.randn(1000, 64)

# Create a consistency ensemble
ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),  # FastICA is identifiable up to permutation
        symmetric=False,
        include_diagonal=True,
    ),
    random_states=5,  # Train 5 instances with different seeds
)

# Fit and evaluate
ensemble.fit(X)
print("Consistency score:", ensemble.score(X))

# Get aligned embeddings
emb = ensemble.transform(X)
print("Embedding shape:", emb.shape)

Documentation

Full documentation is available at dynamical-inference.github.io/ituna.

Backends

🐟iTuna supports different backends for caching and distributed computation:

from ituna import ConsistencyEnsemble, config, metrics
from sklearn.decomposition import FastICA

ensemble = ConsistencyEnsemble(
    estimator=FastICA(n_components=16, max_iter=500),
    consistency_transform=metrics.PairwiseConsistency(
        indeterminacy=metrics.Permutation(),
    ),
    random_states=10,
)

# Enable disk caching (avoids re-fitting identical models)
with config.config_context(DEFAULT_BACKEND="disk_cache"):
    ensemble.fit(X)

# Distributed execution with multiple workers
with config.config_context(
    DEFAULT_BACKEND="disk_cache_distributed",
    BACKEND_KWARGS={"trigger_type": "auto", "num_workers": 4},
):
    ensemble.fit(X)

CLI Commands

For large-scale experiments, use the command-line tools:

# Local distributed backend
ituna-fit-distributed --sweep-name <sweep-uuid> --cache-dir ./cache

# DataJoint backend
ituna-fit-distributed-datajoint --sweep-name <sweep-uuid> --schema-name myschema

Development

# Clone and install in development mode
git clone https://github.com/dynamical-inference/ituna.git
cd ituna
pip install -e .[dev]

# Run tests
pytest tests -v

# Setup pre-commit hooks
pre-commit install

For the full development guide — branching conventions, code style, building docs, and the release process — see CONTRIBUTING.md.

Citation

If you use 🐟iTuna in your research, please cite:

@software{ituna,
  author = {Schmidt, Tobias and Schneider, Steffen},
  title = {iTuna: Tune machine learning models for empirical identifiability and consistency},
  url = {https://github.com/dynamical-inference/ituna},
  version = {0.1.0},
}

License

🐟iTuna is released under the MIT License. If you re-use parts of the iTuna code in your own package, please make sure to copy & paste the contents of the LICENSE file into a NOTICE in your repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ituna-0.1.0.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ituna-0.1.0-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file ituna-0.1.0.tar.gz.

File metadata

  • Download URL: ituna-0.1.0.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ituna-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8b6c4ea5ad7cbb2cd090ed7449fcd0b68f310be9224321dec9de68799afd34c7
MD5 9112826334432bb3eef61d2857f7a37c
BLAKE2b-256 0f12960a5a38b3022d4fc33536d3b7ac4a57a756a23eeb96ae62ce66517a431b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ituna-0.1.0.tar.gz:

Publisher: publish.yml on dynamical-inference/ituna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ituna-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ituna-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ituna-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 12c8385030b7919d47d7f1421afa2667c54394f40886b61f696b81637ec97a6a
MD5 1ad7414abd5fc2d998cfe0d7b4dac9f4
BLAKE2b-256 98ecbce3af66e95fd8b7e2cebc015ebdd676179aa727ba2a9d08a8e64c7998ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for ituna-0.1.0-py3-none-any.whl:

Publisher: publish.yml on dynamical-inference/ituna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page