Hyperparameter discovery (eps auto-tuning) for ArrowSpace via Optuna.
Project description
arrowspace_tuner
Hyperparameter discovery for ArrowSpace — automatically finds the best eps, k, and tau for your corpus using a query-free spectral objective.
Why
ArrowSpace's retrieval quality depends on three graph-construction parameters:
| Parameter | What it controls |
|---|---|
eps |
Neighbourhood radius for graph edges |
k |
Number of nearest neighbours per node |
tau |
Search temperature (exploration vs. exploitation) |
Setting these by hand is tedious and corpus-dependent. arrowspace_tuner uses Optuna and a label-free spectral MRR proxy to find them automatically in minutes.
Install
# Core (no pandas/plotly)
pip install arrowspace-tuner
# With HTML/CSV reporting
pip install arrowspace-tuner[report]
Quickstart
import numpy as np
import arrowspace_tuner as arrowspace
embeddings = np.load("corpus.npy") # shape (N, D) float64
# One-liner: auto-discover eps, k, tau — runs in ~15 min on 50k corpus
aspace, gl = arrowspace.optuna(embeddings)
# Search as normal
results = aspace.search(query_embedding, gl, tau=0.8)
Power-user API
from arrowspace_tuner import EpsTuner
tuner = EpsTuner(
n_trials = 15,
sample_n = 5_000, # 33x faster: explore on 5k, final build on full corpus
eps_low = 0.8, # narrow bounds if you know your corpus geometry
eps_high = 2.5,
k_low = 15,
k_high = 40,
tau_low = 0.05,
tau_high = 0.5,
n_probe = 50,
storage = "sqlite:///tune.db", # resume interrupted runs
)
aspace, gl = tuner.fit(embeddings)
print(tuner.best_params) # {"eps": 1.615, "k": 38, "tau": 0.114}
print(tuner.best_score) # 2.138
print(tuner.best_fiedler) # 0.718 — graph connectivity health
print(tuner.best_mrr_proxy) # 2.896 — retrieval coherence proxy
# Save CSV + HTML plots (requires [report] extra)
tuner.save_report(out_dir="results")
Speed
The dominant cost is building the ArrowSpace graph on N vectors. With sample_n:
| Setting | Per trial | 15 trials | Notes |
|---|---|---|---|
| Full corpus (50k) | ~23 min | ~5.8h | baseline |
sample_n=5_000 |
~1.5 min | ~27 min | 33x faster, same best params |
The final build after the study always uses the full corpus.
Objective
The objective is a weighted composite of three spectral signals — no ground-truth labels required:
score = 0.70 * mrr_top0_spectral # retrieval coherence
+ 0.20 * log1p(fiedler) # graph connectivity health
+ 0.10 * log1p(var_lambda) # spectral richness
Parallel runs
Optuna + SQLite lets you run multiple workers simultaneously:
# Terminal 1
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15
# Terminal 2 (simultaneously)
python -m arrowspace_tuner --storage sqlite:///tune.db --trials 15
Requirements
- Python ≥ 3.12
arrowspace >= 0.26.0optuna >= 4.8.0scipy >= 1.17.1numpy >= 2.4.4
License
Apache-2.0 — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arrowspace_tuner-0.1.0.tar.gz.
File metadata
- Download URL: arrowspace_tuner-0.1.0.tar.gz
- Upload date:
- Size: 201.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16a59c3a1395dc23c03e3a5bd885f6d18ef304f5e8bcb5d2a4658f6f0505d02b
|
|
| MD5 |
c53eb7e35e00f49625c093a98257be40
|
|
| BLAKE2b-256 |
0cf82b5e452273a6716ba9a01cbea2beaebd0859147bd9a4a934ac43717d31dd
|
File details
Details for the file arrowspace_tuner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: arrowspace_tuner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71b78b61d5af59e902499eca2c1f4164db1fdbdeab07fd47f7140d9c877bae6e
|
|
| MD5 |
5bcd9d575fe66a4238f8fc3da4f694cc
|
|
| BLAKE2b-256 |
e1de80b4d688a160c522e316a09c279af3c62ade34762e6d0f44e45158cfd00e
|