Skip to main content

Hyperparameter discovery (eps auto-tuning) for ArrowSpace via Optuna.

Project description

arrowspace_tuner

CI PyPI Python License

Hyperparameter discovery for ArrowSpace — automatically finds the best eps, k, and tau for your corpus using a query-free spectral objective.

Why

ArrowSpace's retrieval quality depends on three graph-construction parameters:

Parameter What it controls
eps Neighbourhood radius for graph edges
k Number of nearest neighbours per node
tau Search temperature (exploration vs. exploitation)

Setting these by hand is tedious and corpus-dependent. arrowspace_tuner uses Optuna and a label-free spectral MRR proxy to find them automatically in minutes.

Install

# Core (no pandas/plotly)
pip install arrowspace-tuner

# With HTML/CSV reporting
pip install arrowspace-tuner[report]

Quickstart

import numpy as np
import arrowspace_tuner as arrowspace

embeddings = np.load("corpus.npy")   # shape (N, D) float64

# One-liner: auto-discover eps, k, tau — runs in ~15 min on 50k corpus
aspace, gl = arrowspace.optuna(embeddings)

# Search as normal
results = aspace.search(query_embedding, gl, tau=0.8)

Power-user API

from arrowspace_tuner import EpsTuner

tuner = EpsTuner(
    n_trials  = 15,
    sample_n  = 5_000,    # 33x faster: explore on 5k, final build on full corpus
    eps_low   = 0.8,      # narrow bounds if you know your corpus geometry
    eps_high  = 2.5,
    k_low     = 15,
    k_high    = 40,
    tau_low   = 0.05,
    tau_high  = 0.5,
    n_probe   = 50,
    storage   = "sqlite:///tune.db",   # resume interrupted runs
)

aspace, gl = tuner.fit(embeddings)

print(tuner.best_params)    # {"eps": 1.615, "k": 38, "tau": 0.114}
print(tuner.best_score)     # 2.138
print(tuner.best_fiedler)   # 0.718  — graph connectivity health
print(tuner.best_mrr_proxy) # 2.896  — retrieval coherence proxy

# Save CSV + HTML plots (requires [report] extra)
tuner.save_report(out_dir="results")

Speed

The dominant cost is building the ArrowSpace graph on N vectors. With sample_n:

Setting Per trial 15 trials Notes
sample_n = 50k ~23 min ~5.8h baseline
sample_n=5_000 ~1.5 min ~27 min 33x faster, same best params

The final build after the study always uses the full corpus.

Objective

The objective is a weighted composite of three spectral signals — no ground-truth labels required:

score = 0.70 * mrr_top0_spectral   # retrieval coherence
      + 0.20 * log1p(fiedler)      # graph connectivity health
      + 0.10 * log1p(var_lambda)   # spectral richness

Parallel runs

Optuna + SQLite lets you run multiple workers simultaneously:

# Terminal 1
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

# Terminal 2 (simultaneously)
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

Requirements

  • Python ≥ 3.12
  • arrowspace >= 0.26.0
  • optuna >= 4.8.0
  • scipy >= 1.17.1
  • numpy >= 2.4.4

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrowspace_tuner-0.2.0.tar.gz (216.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arrowspace_tuner-0.2.0-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file arrowspace_tuner-0.2.0.tar.gz.

File metadata

  • Download URL: arrowspace_tuner-0.2.0.tar.gz
  • Upload date:
  • Size: 216.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arrowspace_tuner-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cf867d63cdbcd740dbd602796c9783727ce453a05dd69353207004a8d76faa5e
MD5 258ba54a866aa467d35a250ee79aac76
BLAKE2b-256 52a6e1f6518814db1a4b4e569a8866876388bec007512fbc33dc6adfe9d05ba5

See more details on using hashes here.

File details

Details for the file arrowspace_tuner-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: arrowspace_tuner-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arrowspace_tuner-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4421ce3432f00bdb50978e020c38bbc335ad5aa809d88549e10119cd4a742bf
MD5 a5cd9377307642ebd84ee9029b2eae5c
BLAKE2b-256 4b4a7dd8bc693308c7533fea1d874f223cfc0466166b971a421e45fbe17e6b14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page