High efficiency Local/self-hosted ML Experiment Tracking System

Project description

KohakuBoard

High-performance ML experiment tracking with zero training overhead.

Part of KohakuHub - Self-hosted AI Infrastructure

Quick Start

pip install -e .

from kohakuboard.client import Board

board = Board(name="my-experiment", config={"lr": 0.001, "batch_size": 32})

# Training loop
for epoch in range(10):
    for data, target in train_loader:
        loss = train_step(data, target)

        board.step()  # Once per optimizer step
        board.log(loss=loss.item())  # Non-blocking, <0.1ms
        # Alternative: move board.step() after board.log() for 0-indexed steps

# logs are stored under ./kohakuboard using KohakuVault column stores + SQLite metadata

Join our community: https://discord.gg/xWYrkyvJ2s

Why KohakuBoard?

KohakuBoard's Advantages

Zero Training Overhead - Non-blocking logging returns in <0.1ms
Local-First - No server required during training, view results instantly
High Throughput - 20,000+ metrics/second sustained
Rich Data Types - Scalars, images, videos, tables, histograms
WebGL Visualization - Handle 100K+ datapoints smoothly
Self-Hosted - Your data stays on your infrastructure

Features

Non-Blocking Architecture

Background Writer Process ensures training never waits:

Training Script          Background Process
     │                          │
     ├─ board.log(loss=0.5)     │
     │  └─> Queue.put()         │
     │      (<0.1ms return!)    │
     │                          ├─ Queue.get()
     ├─ Continue training...    ├─ Batch write
     │                          └─ Flush to disk

Performance:

Log call latency: <0.1ms
Throughput: 20,000+ metrics/sec
Queue capacity: 50,000 messages
Memory overhead: ~100-200 MB

Rich Data Types

Unified API for all data types - no step inflation:

board.log(
    loss=0.5,                           # Scalar
    sample_img=Media(image),            # Image
    predictions=Table(results),         # Table
    gradients=Histogram(grads)          # Histogram
)
# All logged at SAME step with 1 queue message!

Supported Types:

Scalars - Metrics, learning rates, accuracies
Media - Images (PNG/JPG), videos (MP4), audio (WAV)
Tables - Structured data with embedded images
Histograms - Weight/gradient distributions with compression (99.8% size reduction)

Three-Tier SQLite Storage Architecture

Powered by KohakuVault - A high-performance storage library with dual interfaces over SQLite:

Three Specialized SQLite Implementations:

1. KohakuVault KVault        2. KohakuVault ColumnVault     3. Standard SQLite
   (K-V Store)                   (Columnar Storage)             (Relational)
   ├─ Media blobs                ├─ Metrics                     ├─ Media metadata
   ├─ B+Tree index on K          ├─ Histograms                  ├─ Tables
   ├─ Content-addressable        ├─ Blob-based columnar         └─ Step info
   └─ .cache() for bulk ops      └─ Dynamic chunk growth

Why KohakuVault?

Zero dependencies - Single SQLite file, no external services
Simple deployment - Just .db files, no infrastructure
Dual-interface design - Dict-like for blobs, list-like for sequences
High performance - Native speed with Pythonic API
Memory efficient - Streaming support, dynamic chunk growth
True SWMR - Multiple readers, single writer via SQLite WAL

Why Three Tiers?

KVault: Optimized for blob storage with B+Tree index, content-addressable
ColumnVault: Optimized for append-heavy time-series with columnar layout
Standard SQLite: Optimized for structured metadata with ACID guarantees

Advanced Visualization

WebGL-Based Charts powered by Plotly.js:

Handle 100K+ datapoints smoothly
Configurable smoothing (EMA, MA, Gaussian)
X-axis selection (step, global_step, any metric)
Multi-metric overlays
Dark/light mode
Responsive design

Rich Viewers:

Histogram Navigator - Step-by-step distribution exploration
Media Viewer - Image grids, video playback
Table Viewer - Structured data with embedded images
Dashboard - Customizable metric layouts

Local-First Workflow

# Train locally
python train.py              # Logs to ./kohakuboard/

# View results (no server required!)
kobo open ./kohakuboard --browser

# Optional server for team sharing (requires kohakuboard_server)
kobo-serve --port 48889

No server setup, no configuration, no hassle.

Quick Start

Installation

pip install -e .

Basic Usage

from kohakuboard.client import Board

# Create board - automatically saves on program exit
board = Board(name="my-experiment", config={"lr": 0.001, "batch_size": 32})

# Training loop
for epoch in range(10):
    for batch_idx, (data, target) in enumerate(train_loader):
        loss = train_step(data, target)

        # Increment step once per optimizer step (not per epoch!)
        board.step()

        # Log metrics (non-blocking, returns in <0.1ms)
        board.log(loss=loss.item(), lr=optimizer.param_groups[0]['lr'])

    # Log validation at end of epoch
    val_loss = validate(model, val_loader)
    board.log(**{"val/loss": val_loss})

# That's it! No .finish() needed - auto-cleanup via atexit

View Results

# Local viewer (no server)
kobo open ./kohakuboard --browser

# Or launch the authenticated server (requires kohakuboard_server)
kobo-serve --port 48889
# Drop/copy board folders into the configured data dir to share runs

Complete Example

from kohakuboard.client import Board, Histogram, Table, Media
import torch

# Create board with hyperparameters
board = Board(
    name="cifar10-resnet18",
    config={"lr": 0.001, "batch_size": 128, "epochs": 100, "optimizer": "AdamW"}
)

# Training loop
for epoch in range(100):
    model.train()
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        # Step once per optimizer step
        board.step()

        # Log scalars (non-blocking, <0.1ms)
        board.log(loss=loss.item(), lr=optimizer.param_groups[0]['lr'])

    # Validation
    model.eval()
    val_loss, correct, predictions_table = 0, 0, []

    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(val_loader):
            output = model(data)
            val_loss += criterion(output, target).item()
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()

            # Sample predictions for table (first batch only)
            if batch_idx == 0:
                for i in range(min(8, len(data))):
                    predictions_table.append({
                        "image": Media(data[i].cpu().numpy()),
                        "true": class_names[target[i]],
                        "pred": class_names[pred[i]],
                        "correct": "✓" if pred[i] == target[i] else "✗"
                    })

    # Log validation (scalars + table + histograms - all at same step!)
    hist_data = {f"grad/{n}": Histogram(p.grad) for n, p in model.named_parameters() if p.grad is not None}
    board.log(**{
        "val/loss": val_loss / len(val_loader),
        "val/accuracy": correct / len(val_loader.dataset),
        "val/predictions": Table(predictions_table),
        **hist_data
    })

# No .finish() needed - automatic cleanup when script exits

Architecture

Client (Training Script)

Main Process (Training)          Background Writer Process
       │                                   │
       ├─ board.log(loss=0.5)              │
       │  └─> Queue.put()                  │
       │      (returns instantly!)         │
       │                                   ├─ Queue.get()
       │                                   ├─ Process batch
       ├─ Continue training...             ├─ Write to storage
       │                                   └─ Flush to disk

Key Features:

Non-blocking: log() returns in <0.1ms
Message Queue: 50,000 message capacity
Writer Process: Background process drains queue
Storage Layer: Three-tier SQLite architecture (KohakuVault KVault + ColumnVault + Standard SQLite)
Graceful Shutdown: atexit hooks + signal handlers ensure no data loss

Backend (Visualization Server)

FastAPI Backend (Port 48889)
    ↓ Read-only connections
Board Files (./kohakuboard/)
    ├── {board_id}/
    │   ├── metadata.json
    │   ├── data/           ← SQL/columnar queries here
    │   │   ├── metrics/    ← KohakuVault DB files
    │   │   └── metadata.db ← SQLite database
    │   └── media/
    │       └── *.png, *.mp4
        ↓ REST API
Vue 3 Frontend (WebGL Charts)

Key Features:

Zero-copy serving: Reads files directly (no database)
Concurrent reads: Multiple connections supported
Fast queries: Columnar storage for metrics
Static serving: Media files served directly

Data Model

Directory Structure

./kohakuboard/
└── {board_id_timestamp}/
    ├── metadata.json           # Board info, config, timestamps
    ├── data/                   # Storage backend files
    │   ├── metrics/            # (hybrid) KohakuVault columnar files
    │   │   ├── train__loss.db
    │   │   ├── val__accuracy.db
    │   │   └── ...
    │   ├── metadata.db         # (hybrid) SQLite metadata
    │   └── histograms/
    │       ├── gradients_i32.db  # int32 precision
    │       └── params_u8.db      # uint8 precision (compact)
    ├── media/                  # Content-addressed storage
    │   ├── {name}_{idx}_{step}_{sha256}.png
    │   ├── {name}_{idx}_{step}_{sha256}.mp4
    │   └── {name}_{idx}_{step}_{sha256}.wav
    └── logs/
        ├── output.log          # Captured stdout/stderr
        └── writer.log          # Writer process logs

Metadata Schema

{
  "board_id": "20250129_150423_abc123",
  "name": "cifar10-resnet18",
  "config": {
    "lr": 0.001,
    "batch_size": 128,
    "epochs": 100
  },
  "created_at": "2025-01-29T15:04:23",
  "finished_at": "2025-01-29T18:32:45",
  "status": "finished",
  "version": "0.0.1"
}

Manual Sync / Remote Sharing

Both the training-side package (kohakuboard) and the optional server (kohakuboard_server) read the exact same directory layout. To move a run between machines:

Copy the entire board folder ({base_dir}/{project}/{board_id}) using cp, rsync, or any file transfer tool.
Drop it into the destination data directory (the folder you pass to kobo open ... or the directory configured via KOHAKU_BOARD_DATA_DIR / --data-dir on kobo-serve).
Restart the viewer or refresh the UI. The new run is immediately available.

No export/import step is required because metrics, metadata, tensors, and media already live in KohakuVault + SQLite files. The legacy kobo sync command still expects a DuckDB export and will fail on modern boards—use manual copy until the new sync API lands.

CLI Tool

# Open local viewer (no server)
kobo open ./kohakuboard --browser

# Start authenticated server (kohakuboard_server package)
kobo-serve --port 48889 --host 0.0.0.0

# Manual sync (recommended today): copy the entire board folder into the server's data dir
# (kobo sync is still wired to the legacy DuckDB exporter and will error on modern boards)

Configuration

Basic Usage

# All boards use KohakuVault + SQLite (no backend parameter needed)
board = Board(name="my-experiment", project="vision")

Advanced Options

board = Board(
    name="experiment",
    board_id="custom-id",           # Auto-generated if not provided
    config={"lr": 0.001},           # Hyperparameters
    project="custom-project",       # Sub-directory inside base_dir
    base_dir="./my-boards",         # Custom directory
    capture_output=True,            # Capture stdout/stderr
    remote_url="https://board.example.com",  # Optional future sync target
    remote_token="...",             # Token for remote sync (WIP)
    sync_enabled=False,             # Enable when remote endpoints are ready
    memory_mode=False,              # Keep data in RAM (requires sync to persist)
    annotation="debug-run",         # Suffix appended to run directory name
)

Storage Architecture:

KohakuVault KVault: Media blobs (K-V table with B+Tree index)
KohakuVault ColumnVault: Metrics/histograms (blob-based columnar)
Standard SQLite: Metadata (traditional relational tables)

Context Manager

with Board(name="experiment") as board:
    board.log(loss=0.5)
    # Automatic flush() and finish() on exit

API Reference

Board

Board(
    name: str | None = None,
    board_id: str | None = None,
    config: dict | None = None,
    project: str | None = None,
    base_dir: str | Path | None = None,
    capture_output: bool = True,
    remote_url: str | None = None,
    remote_token: str | None = None,
    remote_project: str | None = None,
    sync_enabled: bool = False,
    sync_interval: int = 10,
    memory_mode: bool = False,
    *,
    annotation: str | None = None,
)

Methods:

board.step() - Increment global_step

for batch_idx, batch in enumerate(train_loader):
    loss = train_step(batch)
    board.step()  # Increment ONCE per optimizer step
    board.log(**{"train/loss": loss, "train/lr": scheduler.get_last_lr()[0]})

board.log(**metrics) - Log data (non-blocking)

board.log(
    loss=0.5,
    accuracy=0.95,
    learning_rate=0.001,            # Scalars
    sample=Media(image_array),      # Images/video/audio
    predictions=Table(rows),        # Tables (optionally with Media)
    grad_norm=Histogram(values),    # Histograms
)

# Namespaces (creates tabs in UI)
board.log(**{
    "train/loss": 0.5,
    "val/accuracy": 0.95
})

# Tensor + KDE payloads (specialized viewers)
board.log(
    attention_tensor=TensorLog(tensor),
    density=KernelDensity(values, grid_size=256),
)

board.flush() - Force flush (blocks until complete)

board.flush()  # Wait for all pending writes

board.finish() - Manual cleanup (auto-called on exit)

board.finish()  # Flush buffers, close connections

Data Types

Media

from kobo.client.types import Media

# Images
board.log(
    sample_img=Media(image_array),  # numpy, PIL, torch tensor
    prediction=Media(pred_img, caption="Predicted: cat")
)

# Video
board.log(
    training_video=Media("output.mp4", media_type="video")
)

# Audio (if supported)
board.log(
    audio_sample=Media("sample.wav", media_type="audio")
)

Table

from kobo.client.types import Table

# From list of dicts
results = Table([
    {"name": "Alice", "score": 95, "pass": True},
    {"name": "Bob", "score": 87, "pass": True},
])
board.log(student_results=results)

# Tables with embedded images
predictions = Table([
    {"image": Media(img), "label": "cat", "confidence": 0.95},
    {"image": Media(img2), "label": "dog", "confidence": 0.87},
])
board.log(val_predictions=predictions)

Histogram

from kobo.client.types import Histogram

# Log gradient distributions
board.log(
    gradients=Histogram(param.grad),
    weights=Histogram(param.data)
)

# Precompute for efficiency (optional)
hist = Histogram(gradients).compute_bins()
board.log(grad_distribution=hist)

# Compact precision (75% size reduction, ~1% accuracy loss)
hist = Histogram(gradients, precision="compact")
board.log(grad_distribution=hist)

Deployment

Local Mode (Recommended)

# Install
pip install -e .

# Train
python train.py

# View results
kobo open ./kohakuboard --browser

Remote Mode (WIP)

# Run the authenticated server (still stabilizing)
kobo-serve --data-dir /var/kohakuboard --db sqlite:///kohakuboard.db

# Share boards by copying folders into /var/kohakuboard/<project>/
# Restart/reload the server to pick up new runs

See docs/kohakuboard/ for complete deployment guides.

Comparison with Alternatives

Feature	WandB	TensorBoard	MLflow	KohakuBoard
Latency	~10ms	~1ms	~5ms	<0.1ms
Throughput	~1K/sec	~10K/sec	~5K/sec	20K+/sec
Offline	❌ No	✅ Yes	✅ Yes	✅ Yes
File-Based	❌ No	✅ Yes	❌ No	✅ Yes
Non-Blocking	❌ No	❌ No	❌ No	✅ Yes
Columnar Reads	❌ No	❌ No	✅ Yes	✅ Yes (KohakuVault ColumnVault)
WebGL Charts	❌ No	❌ No	❌ No	✅ Yes
100K+ Points	Slow	Slow	Slow	Fast
Self-Hosted	Limited	✅ Yes	✅ Yes	✅ Yes
Setup	Cloud	Local	Server	None

Documentation

Getting Started - Installation and basic usage
API Reference - Complete API documentation
Architecture - System design and internals
CLI Guide - Command-line tool usage
Usage Manual - Day-to-day operations checklist
Examples - Real-world usage patterns

Examples

See examples/ directory:

kohakuboard_basic.py - Simple scalar logging
kohakuboard_all_media_types.py - Images, videos, tables
kohakuboard_cifar_training.py - Complete CIFAR-10 training example
kohakuboard_media_in_tables.py - Tables with embedded images
kohakuboard_histogram_logging.py - Gradient distribution tracking

Roadmap

✅ Complete

Client Library:

Non-blocking logging architecture
Rich data types (scalars, media, tables, histograms)
Three-tier SQLite architecture (KohakuVault KVault + ColumnVault + Standard SQLite)
Graceful shutdown with queue draining
Content-addressed media storage

Backend & UI:

FastAPI REST API
Vue 3 interface with dark/light mode
WebGL charts (100K+ points)
Histogram navigator
Media/table viewers
CLI tool (kobo)

🚧 In Progress

Remote server mode with authentication
Sync protocol for uploading local boards
Project management (group related boards)
Run comparison UI (side-by-side metrics)
Real-time streaming (live updates while training)

📋 Planned

Client Features:

PyTorch Lightning integration
Keras callback
Hugging Face Trainer integration
Custom callback system

Backend Features:

Multi-board comparison API
Advanced filtering (tags, date range)
Export to CSV/JSON
Aggregation queries

UI Features:

Diff viewer (compare runs)
Scatter plots (metric vs metric)
Custom dashboards
Annotations
Search and filter

Infrastructure:

Docker/Kubernetes deployment
Cloud storage backends (S3, GCS)
Multi-user authentication

License

KohakuBoard is a multi-component project with different licenses:

Client Library (kohakuboard): Apache License 2.0
- Free for commercial and non-commercial use
- Permissive license with minimal requirements
Web UI (kohaku-board-ui): AGPL-3.0
- Free to use and modify
- Source code disclosure required for network services
Server (kohakuboard_server): Kohaku Software License 1.0
- Free for non-commercial use
- Free for commercial use under revenue/duration limits
- Commercial licenses available for larger deployments

Commercial Licensing: For commercial licenses or exemptions, contact kohaku@kblueleaf.net

See LICENSE for complete details.

Contributing

KohakuBoard is part of the KohakuHub ecosystem. We welcome contributions!

Before contributing:

Read CONTRIBUTING.md for code style and guidelines
Join our Discord for discussions
Check open issues tagged with kohakuboard

Areas we need help:

🎨 Frontend (chart improvements, UI/UX)
🔧 Backend (storage backends, performance)
📊 Client library (framework integrations)
📚 Documentation (tutorials, guides)
🧪 Testing (unit tests, benchmarks)

Support

Discord: https://discord.gg/xWYrkyvJ2s (Use #kohakuboard channel)
Issues: https://github.com/KohakuBlueleaf/KohakuHub/issues (Tag with kohakuboard)
Email: kohaku@kblueleaf.net

Acknowledgments

KohakuVault - High-performance storage library with dual SQLite interfaces (KVault for blobs, ColumnVault for sequences)
Plotly.js - WebGL charts
Vue 3 - Modern UI framework
FastAPI - Backend framework

Production Ready! Core features are stable and performant. Use in real training workflows and help us improve.

Project details

Release history Release notifications | RSS feed

This version

0.0.1

Jan 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kohakuboard-0.0.1-py3-none-any.whl (2.8 MB view details)

Uploaded Jan 9, 2026 Python 3

File details

Details for the file kohakuboard-0.0.1-py3-none-any.whl.

File metadata

Download URL: kohakuboard-0.0.1-py3-none-any.whl
Upload date: Jan 9, 2026
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for kohakuboard-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f25d8cf8c02e0e5552fac5efa1677ab52e3100868d4f0be0a62f10fb365d39a1`
MD5	`17a0db4588cb60437685a2445add7112`
BLAKE2b-256	`756375f255f29ebeec05764561924b216c8c45073c776008fc16b91236ce4987`

See more details on using hashes here.

kohakuboard 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

KohakuBoard

Quick Start

Why KohakuBoard?

KohakuBoard's Advantages

Features

Non-Blocking Architecture

Rich Data Types

Three-Tier SQLite Storage Architecture

Advanced Visualization

Local-First Workflow

Quick Start

Installation

Basic Usage

View Results

Complete Example

Architecture

Client (Training Script)

Backend (Visualization Server)

Data Model

Directory Structure

Metadata Schema

Manual Sync / Remote Sharing

CLI Tool

Configuration

Basic Usage

Advanced Options

Context Manager

API Reference

Board

Data Types

Media

Table

Histogram

Deployment

Local Mode (Recommended)

Remote Mode (WIP)

Comparison with Alternatives

Documentation

Examples

Roadmap

✅ Complete

🚧 In Progress

📋 Planned

License

Contributing

Support

Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes