Python client + typed contract for the BenchHub benchmarking platform
Project description
BenchHub
BenchHub is an open-source benchmarking platform: pick a dataset, define metrics in Python, upload predictions, and see how your model ranks. Live at https://runbenchhub.com.
Originally built as a private dTOF SPAD pipeline benchmarking tool, then generalized into a public, multi-tenant web app.
Features
- Passwordless sign-in — GitHub, Google, or a one-time email code.
- Strict typed contract — every field has a kind (image, mask, depth,
audio, label, bboxes, scalar, json, sequence/video, …) shared by the
dataset, the
benchhub-client, and the metric engine (benchhub/types.py). - Self-service HuggingFace import — one button: the tabular importer (Croissant/parquet, inferred + editable) falls back to a file-tree mapper for repos of paired files / packed archives / video clips. The mapper has a "describe the structure" role wizard, loaders for file/npz/json/csv/parquet/hdf5/zip/tar/gz/token/sequence, a decode preview, variant fan-out, and draft autosave.
- Two-tier storage — datasets cache as a cheap preview tier; each leaderboard materializes a chosen sample subset at full resolution.
- Datasets and leaderboards are global; per-row visibility
(
public/unlisted/private) on datasets, leaderboards, and metric/visualization library entries. - User-defined metrics & visualizations in Python — typed signatures, per-sample + aggregated, pooling, dependency chaining. All user code runs in a hardened, network-isolated, read-only sandbox container (one short-lived sandbox per job) — never in-process on the server.
- User-registered data types — declare a new
kind(its storage + avisualize(blob, params)that runs in the sandbox) viaclient.create_datatype(...); it joins the global kind namespace. benchhub-client+ dev kit —iter_samples(decoded typed inputs incl. iterable video clips) → predict → submit; programmatic dataset creation;client.create_metric/create_visualization/create_datatype; andbenchhub.author.test_metric/test_visualizationto iterate locally before uploading.- Asynchronous processing with Celery (Redis broker).
- Split-bucket quotas — 50 GB public + 10 GB private per user by default.
- API tokens (
/settings/api_tokens), account deletion with cascading cleanup, public landing (/), catalog (/leaderboards,/datasets), profiles (/u/<id>).
Documentation
Full user docs live in-app at /docs
(templates under templates/docs/): overview, core concepts, importing data,
data types, leaderboards, writing metrics & visualizations, submitting
predictions, the API/client reference, and step-by-step tutorials. A
high-level pipeline diagram is in
docs/ARCHITECTURE.md (editable drawio source under
docs/diagrams/). Architecture/dev notes are in
CLAUDE.md; the session-by-session dev history is under
docs/ (e.g. SESSION_NOTES_2026-05.md).
Feature requests + bug reports → GitHub issues.
Prerequisites
- Python 3.10+
- Redis (broker + result backend, default port 6379)
Installation
git clone <repository-url>
cd BenchHub
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
Running
Three terminals:
# 1. Redis
redis-server
# 2. Celery worker
celery -A app.celery worker --loglevel=info
# 3. Flask app
python app.py
Then open http://localhost:6060.
Data lives outside the repo at ~/.dtofbenchmarking/ (database + uploads).
Override with BENCHHUB_DATA_DIR=/some/path.
Tests
pytest tests/
~1000+ tests. (Run pytest tests/, not bare pytest, so it skips the
ad-hoc root-level test_chain*.py experiments.)
Datasets & the typed contract
Datasets are typed: a directory with a manifest.json declaring
fields[] ({name, kind, role, params}) plus one folder per field holding
<sample>.<ext>. You rarely build this by hand — the HuggingFace
importers and the client's BHDatasetCreator produce it for you. The
kinds and the import flows are documented in-app at
/docs (Data Types, Importing Data). The
legacy folder-name-prefix ZIP path has been removed.
DLP-safe code uploads
Some networks block .py uploads. The metric editor encodes user code
as BASE64:<...> client-side; the server decodes. Standalone helpers:
scripts/obfuscator.html— portable browser toolscripts/obfuscator_gui.py— Tkinter GUI
Deployment
The production app is self-hosted on a home Ubuntu 24.04 box (RTX 5090, 128 GB RAM, 8 TB) reachable at https://runbenchhub.com. gunicorn + celery
- redis run directly under systemd; nginx + certbot terminate TLS; the
domain is on Cloudflare in DNS-only mode (no proxy) with
ddclientkeeping the A record pointed at the home WAN IP.
Operational runbook: docs/SELFHOST_RUNBOOK.md
— code-push procedure, .env keys, log tailing, DDNS, TLS renewal,
rollback, and the breakages we've already hit.
Fly.io is deprecated: the app was destroyed after the cutover to the home
box. The Fly artifacts (fly.toml, Dockerfile, DEPLOY.md, …) are
archived under archive/fly/ for the case where a future
Fly redeploy needs to be reconstructed.
License
(Choose and add a license file — repository currently has no LICENSE.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file benchhub_client-0.1.8.tar.gz.
File metadata
- Download URL: benchhub_client-0.1.8.tar.gz
- Upload date:
- Size: 237.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0588adb702202d32ef15282a26cf8d92f633babc5ad49a39a5382b084cb2e63a
|
|
| MD5 |
746fccdc181330e31d38948474ccae9c
|
|
| BLAKE2b-256 |
8974a117fe6888f8f761565492ec6979e98af3d6b65cf4d72bf729622861bb59
|
File details
Details for the file benchhub_client-0.1.8-py3-none-any.whl.
File metadata
- Download URL: benchhub_client-0.1.8-py3-none-any.whl
- Upload date:
- Size: 75.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba99121a334e8fb1ac9f7b80c2c2a7202f8649e84577f88969773cb505c883d9
|
|
| MD5 |
96f9a14ce8208b615fc1995521fbbec3
|
|
| BLAKE2b-256 |
1c8ca2a9becbf188c1e0e958cec08910311d95e3abfea19f5a4b2256e090647d
|