Skip to main content

Intrinsic Green Learning: task-conditioned intrinsic-dimensionality discovery via a learned encoder and a multi-scale Green's-function kernel.

Project description

Intrinsic Green Learning

OpenSSF Scorecard OpenSSF Best Practices REUSE compliant

Task-conditioned intrinsic-dimensionality discovery for high-dimensional data. IGL pairs a learned encoder with a multi-scale Green's-function kernel and trains the system end-to-end via Variable Projection with random Matryoshka truncation. The model fits the task and simultaneously reveals how many dimensions the task actually needs — usually far fewer than the ambient input.

Note on the import name. The distribution is intrinsic-green-learning; the import name is igl. This collides with libigl; if you need both in the same env, install one of them with a different module name.

Why IGL?

For the same input data, a classifier usually needs fewer latent dimensions than a regressor, which in turn needs fewer than a full autoencoder. IGL discovers this hierarchy automatically:

$$ d_{\text{eff}}(\text{classification}) ;\le; d_{\text{eff}}(\text{regression}) ;\le; d_{\text{eff}}(\text{reconstruction}) $$

The library ships an examples/synthetic/moons_xor.py script that fits all three estimators on the same data and reports the discovered dimensions — the hierarchy holds out of the box.

Installation

pip install intrinsic-green-learning

Optional extras:

Extra Adds Use case
[viz] matplotlib Plot dimension curves via igl.viz.plot_dimension_curve.
[eeg] mne + moabb + pyriemann Future EEG / clinical loaders (placeholder for v0.2).
[nlp] transformers + datasets Future NLP loaders.
[elbow] kneed Alternative elbow detector.
[all] all of the above One-shot install for development.

Quickstart

The library exposes three sklearn-compatible estimators plus a SPD extension. All accept numpy arrays at the API boundary.

Classification

import numpy as np
import igl
from igl.data import embed_in_high_dim, make_moons

x_2d, y = make_moons(400, noise=0.1, seed=0)
x = embed_in_high_dim(x_2d, target_dim=16, seed=0).numpy()

clf = igl.IGLClassifier(max_dim=8, random_state=0).fit(x, y.numpy())
print(f"accuracy = {clf.score(x, y.numpy()):.3f}")
print(f"discovered d_eff = {clf.effective_dimension_}")  # ~ 1 on moons

Regression and reconstruction

from igl.data import make_swiss_roll

x, params = make_swiss_roll(800, seed=0)
x_np = x.numpy(); params_np = params.numpy()

reg = igl.IGLRegressor(max_dim=8, random_state=0).fit(x_np, params_np)
ae = igl.IGLAutoencoder(max_dim=8, random_state=0).fit(x_np)

print(reg.effective_dimension_)   # ~ 2 on swiss roll (intrinsic dim)
print(ae.effective_dimension_)    # ~ 2 on swiss roll

Cross-task hierarchy check

report = igl.compare_d_eff(
    cls=clf.dimension_curve_,
    reg=reg.dimension_curve_,
    recon=ae.dimension_curve_,
)
print(report.d_effs)            # {'cls': 1, 'reg': 2, 'recon': 2}
print(report.hierarchy_holds)   # True

SPD / Riemannian extension

For covariance-valued data (EEG, clinical signals, …), igl.spd ships an AIRM-based reconstruction classifier:

from igl.data import make_spd_dataset
from igl.spd import IGLReconSPDClassifier, LogEigVectorizer

spd, y = make_spd_dataset(400, d=8, n_classes=3, seed=0)
x = LogEigVectorizer().fit(spd.numpy()).transform(spd.numpy())

clf = IGLReconSPDClassifier(
    latent_dim=8, max_dim=12,
    orthogonality_weight=0.1,   # plug-in via the ExtraLoss seam
    random_state=0,
).fit(x, y.numpy())
print(clf.effective_dimension_)

Custom training loop

If sklearn's surface is too high-level, use the bare PyTorch entry points directly:

import torch
import igl

module = igl.IGLModule(
    input_dim=16, max_dim=8, output_dim=2,
    config=igl.IGLConfig(
        encoder=igl.EncoderConfig(hidden=(128, 64)),  # pyramidal MLP
        kernel=igl.KernelConfig(n_anchors=64, operator=igl.OperatorName.GAUSSIAN),
    ),
)

trainer = igl.MatryoshkaTrainer(
    loss=igl.CrossEntropyLoss(n_classes=2),
    config=igl.MatryoshkaConfig(epochs=500),
)
history = trainer.fit(module, x_train_t, y_train_t, x_val=x_val_t, y_val=y_val_t)
curve = igl.eval_dimension_curve(module, x_val_t, y_val_t, loss=igl.CrossEntropyLoss(n_classes=2))
print("d_eff =", igl.detect_elbow(curve))

Documentation

Local build:

uv sync --group doc
uv run mkdocs serve

Published at https://hotherio.github.io/intrinsic-green-learning/latest/ after the first release.

Examples

Three runnable scripts under examples/synthetic/:

Script Manifold Tasks Expected d_eff
torus_classification.py T² ⊂ R⁴ → R³² XOR cls + sin/cos reg ≈ 2
moons_xor.py Moons ⊂ R² → R¹⁶ cls + reg + recon d_cls ≤ d_reg ≤ d_recon
swiss_roll_recon.py Swiss roll ⊂ R³ autoencoder + reg ≈ 2

Run with python -m examples.synthetic.<name>; outputs land in results/<name>/<git_short_sha>/. Install [viz] for PNG plots.

Development

uv sync --all-groups
uv run lefthook install

Verify your environment:

uv run pytest                                # tests + 100% coverage
uv run basedpyright src                      # strict type check
uv run lefthook run pre-commit --all-files   # full pre-commit pass

Conventions

The library follows the Hother Python guidelines under docs/guidelines/:

  • basedpyright strict type checking; Any is not allowed in public signatures.
  • __all__ exhaustive at every module surface.
  • Google-style docstrings on every public symbol.
  • Single base exception igl.IGLError, one level deep.
  • Conventional Commits: commit subjects drive python-semantic-release (feat: → minor, fix: / perf: / refactor: → patch, BREAKING CHANGE: → major).
  • String-valued type aliases are enum.StrEnum classes with a companion Literal mirror; public APIs accept either form.

Release process

Releases are fully automated by python-semantic-release on every push to main via .github/workflows/semantic-release.yml. See docs/security.md for the supply-chain posture (OIDC, sigstore attestations, GPG-signed checksums, pip-audit).

License

MIT. See LICENSE and REUSE.toml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intrinsic_green_learning-0.1.0.tar.gz (92.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intrinsic_green_learning-0.1.0-py3-none-any.whl (94.8 kB view details)

Uploaded Python 3

File details

Details for the file intrinsic_green_learning-0.1.0.tar.gz.

File metadata

  • Download URL: intrinsic_green_learning-0.1.0.tar.gz
  • Upload date:
  • Size: 92.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for intrinsic_green_learning-0.1.0.tar.gz
Algorithm Hash digest
SHA256 862942e609ce0bf35084aed4456a84239ddf5a19361f7a90e9f0b8f4cc9285b8
MD5 8ef1b59a9cff2796f4b50fb0d1717715
BLAKE2b-256 b1e582567c072ff1cecceab1ae75c35dfacec053c5bae7f4d454bd1f03ff356d

See more details on using hashes here.

File details

Details for the file intrinsic_green_learning-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: intrinsic_green_learning-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 94.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for intrinsic_green_learning-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4560b806d169249a8a76fd1dd8d30baeace89212e353c00dbfa0fb515286ff3d
MD5 8aad54baafabb6d276225cad4decb81e
BLAKE2b-256 5474c740b1c087d8e846354895f69779d2170d6054ded96741cebccea2d618db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page