Skip to main content

Hyperparameter discovery (eps auto-tuning) for ArrowSpace via Optuna

Project description

arrowspace_tuner

CI PyPI Python License

Hyperparameter discovery for ArrowSpace — automatically finds the best eps, k, and tau for your corpus using a query-free spectral objective.

Why

ArrowSpace's retrieval quality depends on three graph-construction parameters:

Parameter What it controls
eps Neighbourhood radius for graph edges
k Number of nearest neighbours per node
tau Search temperature (exploration vs. exploitation)

Setting these by hand is tedious and corpus-dependent. arrowspace_tuner uses Optuna and a label-free spectral MRR proxy to find them automatically in minutes.

Install

# Core (no pandas/plotly)
pip install arrowspace-tuner

# With HTML/CSV reporting
pip install arrowspace-tuner[report]

Quickstart

import numpy as np
import arrowspace_tuner as arrowspace

embeddings = np.load("corpus.npy")   # shape (N, D) float64

# One-liner: auto-discover eps, k, tau — runs in ~15 min on 50k corpus
aspace, gl = arrowspace.optuna(embeddings)

# Search as normal
results = aspace.search(query_embedding, gl, tau=0.8)

Power-user API

from arrowspace_tuner import EpsTuner

tuner = EpsTuner(
    n_trials  = 15,
    sample_n  = 5_000,    # 33x faster: explore on 5k, final build on full corpus
    eps_low   = 0.8,      # narrow bounds if you know your corpus geometry
    eps_high  = 2.5,
    k_low     = 15,
    k_high    = 40,
    tau_low   = 0.05,
    tau_high  = 0.5,
    n_probe   = 50,
    storage   = "sqlite:///tune.db",   # resume interrupted runs
)

aspace, gl = tuner.fit(embeddings)

print(tuner.best_params)    # {"eps": 1.615, "k": 38, "tau": 0.114}
print(tuner.best_score)     # 2.138
print(tuner.best_fiedler)   # 0.718  — graph connectivity health
print(tuner.best_mrr_proxy) # 2.896  — retrieval coherence proxy

# Save CSV + HTML plots (requires [report] extra)
tuner.save_report(out_dir="results")

Speed

The dominant cost is building the ArrowSpace graph on N vectors. With sample_n:

Setting Per trial 15 trials Notes
sample_n = 50k ~23 min ~5.8h baseline
sample_n=5_000 ~1.5 min ~27 min 33x faster, same best params

The final build after the study always uses the full corpus.

Objective

The objective is a weighted composite of three spectral signals — no ground-truth labels required:

score = 0.70 * mrr_top0_spectral   # retrieval coherence
      + 0.20 * log1p(fiedler)      # graph connectivity health
      + 0.10 * log1p(var_lambda)   # spectral richness

Parallel runs

Optuna + SQLite lets you run multiple workers simultaneously:

# Terminal 1
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

# Terminal 2 (simultaneously)
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

Requirements

  • Python ≥ 3.12
  • arrowspace >= 0.26.0
  • optuna >= 4.8.0
  • scipy >= 1.17.1
  • numpy >= 2.4.4

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrowspace_tuner-0.2.3.tar.gz (224.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arrowspace_tuner-0.2.3-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file arrowspace_tuner-0.2.3.tar.gz.

File metadata

  • Download URL: arrowspace_tuner-0.2.3.tar.gz
  • Upload date:
  • Size: 224.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arrowspace_tuner-0.2.3.tar.gz
Algorithm Hash digest
SHA256 8b770ab91495dbda01703aacd9d87bb676d32702a9c4b0ef7595ba8c2c6814ef
MD5 be5ca935eb39178063b1cb337ece81a3
BLAKE2b-256 70a48058c0dda1ae527ba90515da104da02f6f62ee227937453f6bbde822aa81

See more details on using hashes here.

File details

Details for the file arrowspace_tuner-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: arrowspace_tuner-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arrowspace_tuner-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2817357d9d442fac7879039b1fdff44f944f1e6ea7d46b35192bffdbea3f226f
MD5 13f2b659e6bf6ef58423831220619451
BLAKE2b-256 b466ff9a994b0e76634e28a87bd13977e3ab813a7265ea750ea2654f447036e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page