Skip to main content

Hyperparameter discovery (eps auto-tuning) for ArrowSpace via Optuna.

Project description

arrowspace_tuner

CI PyPI Python License

Hyperparameter discovery for ArrowSpace — automatically finds the best eps, k, and tau for your corpus using a query-free spectral objective.

Why

ArrowSpace's retrieval quality depends on three graph-construction parameters:

Parameter What it controls
eps Neighbourhood radius for graph edges
k Number of nearest neighbours per node
tau Search temperature (exploration vs. exploitation)

Setting these by hand is tedious and corpus-dependent. arrowspace_tuner uses Optuna and a label-free spectral MRR proxy to find them automatically in minutes.

Install

# Core (no pandas/plotly)
pip install arrowspace-tuner

# With HTML/CSV reporting
pip install arrowspace-tuner[report]

Quickstart

import numpy as np
import arrowspace_tuner as arrowspace

embeddings = np.load("corpus.npy")   # shape (N, D) float64

# One-liner: auto-discover eps, k, tau — runs in ~15 min on 50k corpus
aspace, gl = arrowspace.optuna(embeddings)

# Search as normal
results = aspace.search(query_embedding, gl, tau=0.8)

Power-user API

from arrowspace_tuner import EpsTuner

tuner = EpsTuner(
    n_trials  = 15,
    sample_n  = 5_000,    # 33x faster: explore on 5k, final build on full corpus
    eps_low   = 0.8,      # narrow bounds if you know your corpus geometry
    eps_high  = 2.5,
    k_low     = 15,
    k_high    = 40,
    tau_low   = 0.05,
    tau_high  = 0.5,
    n_probe   = 50,
    storage   = "sqlite:///tune.db",   # resume interrupted runs
)

aspace, gl = tuner.fit(embeddings)

print(tuner.best_params)    # {"eps": 1.615, "k": 38, "tau": 0.114}
print(tuner.best_score)     # 2.138
print(tuner.best_fiedler)   # 0.718  — graph connectivity health
print(tuner.best_mrr_proxy) # 2.896  — retrieval coherence proxy

# Save CSV + HTML plots (requires [report] extra)
tuner.save_report(out_dir="results")

Speed

The dominant cost is building the ArrowSpace graph on N vectors. With sample_n:

Setting Per trial 15 trials Notes
sample_n = 50k ~23 min ~5.8h baseline
sample_n=5_000 ~1.5 min ~27 min 33x faster, same best params

The final build after the study always uses the full corpus.

Objective

The objective is a weighted composite of three spectral signals — no ground-truth labels required:

score = 0.70 * mrr_top0_spectral   # retrieval coherence
      + 0.20 * log1p(fiedler)      # graph connectivity health
      + 0.10 * log1p(var_lambda)   # spectral richness

Parallel runs

Optuna + SQLite lets you run multiple workers simultaneously:

# Terminal 1
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

# Terminal 2 (simultaneously)
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

Requirements

  • Python ≥ 3.12
  • arrowspace >= 0.26.0
  • optuna >= 4.8.0
  • scipy >= 1.17.1
  • numpy >= 2.4.4

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrowspace_tuner-0.2.1.tar.gz (218.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arrowspace_tuner-0.2.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file arrowspace_tuner-0.2.1.tar.gz.

File metadata

  • Download URL: arrowspace_tuner-0.2.1.tar.gz
  • Upload date:
  • Size: 218.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arrowspace_tuner-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e2fb2cbc43c853c3b8a6acd671be8f6f7472873ee175fb7dd720b9726e7bb24a
MD5 06fa57a301dd39a76a86031481b9a5a4
BLAKE2b-256 f1e207063c0335a44a17eec2e6ce7e6aa66f4a90793537a91abaeecec56b25de

See more details on using hashes here.

File details

Details for the file arrowspace_tuner-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: arrowspace_tuner-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for arrowspace_tuner-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb575133af0244bc42cb573f8c69ae6135c06c3a138891b3f5d42e1fadc2fe6d
MD5 6322019a26bf48e35bad7f8f20bdf2ce
BLAKE2b-256 951bbbfc5ee12dd4edcd65f580e0bfcb834ed6fdbb2e9376966e53fe9241e58d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page