Skip to main content

Hyperparameter discovery (eps auto-tuning) for ArrowSpace via Optuna.

Project description

arrowspace_tuner

CI PyPI Python License

Hyperparameter discovery for ArrowSpace — automatically finds the best eps, k, and tau for your corpus using a query-free spectral objective.

Why

ArrowSpace's retrieval quality depends on three graph-construction parameters:

Parameter What it controls
eps Neighbourhood radius for graph edges
k Number of nearest neighbours per node
tau Search temperature (exploration vs. exploitation)

Setting these by hand is tedious and corpus-dependent. arrowspace_tuner uses Optuna and a label-free spectral MRR proxy to find them automatically in minutes.

Install

# Core (no pandas/plotly)
pip install arrowspace-tuner

# With HTML/CSV reporting
pip install arrowspace-tuner[report]

Quickstart

import numpy as np
import arrowspace_tuner as arrowspace

embeddings = np.load("corpus.npy")   # shape (N, D) float64

# One-liner: auto-discover eps, k, tau — runs in ~15 min on 50k corpus
aspace, gl = arrowspace.optuna(embeddings)

# Search as normal
results = aspace.search(query_embedding, gl, tau=0.8)

Power-user API

from arrowspace_tuner import EpsTuner

tuner = EpsTuner(
    n_trials  = 15,
    sample_n  = 5_000,    # 33x faster: explore on 5k, final build on full corpus
    eps_low   = 0.8,      # narrow bounds if you know your corpus geometry
    eps_high  = 2.5,
    k_low     = 15,
    k_high    = 40,
    tau_low   = 0.05,
    tau_high  = 0.5,
    n_probe   = 50,
    storage   = "sqlite:///tune.db",   # resume interrupted runs
)

aspace, gl = tuner.fit(embeddings)

print(tuner.best_params)    # {"eps": 1.615, "k": 38, "tau": 0.114}
print(tuner.best_score)     # 2.138
print(tuner.best_fiedler)   # 0.718  — graph connectivity health
print(tuner.best_mrr_proxy) # 2.896  — retrieval coherence proxy

# Save CSV + HTML plots (requires [report] extra)
tuner.save_report(out_dir="results")

Speed

The dominant cost is building the ArrowSpace graph on N vectors. With sample_n:

Setting Per trial 15 trials Notes
Full corpus (50k) ~23 min ~5.8h baseline
sample_n=5_000 ~1.5 min ~27 min 33x faster, same best params

The final build after the study always uses the full corpus.

Objective

The objective is a weighted composite of three spectral signals — no ground-truth labels required:

score = 0.70 * mrr_top0_spectral   # retrieval coherence
      + 0.20 * log1p(fiedler)      # graph connectivity health
      + 0.10 * log1p(var_lambda)   # spectral richness

Parallel runs

Optuna + SQLite lets you run multiple workers simultaneously:

# Terminal 1
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

# Terminal 2 (simultaneously)
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15

Requirements

  • Python ≥ 3.12
  • arrowspace >= 0.26.0
  • optuna >= 4.8.0
  • scipy >= 1.17.1
  • numpy >= 2.4.4

License

Apache-2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrowspace_tuner-0.1.0.tar.gz (201.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arrowspace_tuner-0.1.0-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file arrowspace_tuner-0.1.0.tar.gz.

File metadata

  • Download URL: arrowspace_tuner-0.1.0.tar.gz
  • Upload date:
  • Size: 201.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for arrowspace_tuner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16a59c3a1395dc23c03e3a5bd885f6d18ef304f5e8bcb5d2a4658f6f0505d02b
MD5 c53eb7e35e00f49625c093a98257be40
BLAKE2b-256 0cf82b5e452273a6716ba9a01cbea2beaebd0859147bd9a4a934ac43717d31dd

See more details on using hashes here.

File details

Details for the file arrowspace_tuner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for arrowspace_tuner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71b78b61d5af59e902499eca2c1f4164db1fdbdeab07fd47f7140d9c877bae6e
MD5 5bcd9d575fe66a4238f8fc3da4f694cc
BLAKE2b-256 e1de80b4d688a160c522e316a09c279af3c62ade34762e6d0f44e45158cfd00e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page