Skip to main content

Terminal-first live training monitor for Python ML workloads across frameworks.

Project description

vertical

vertical is a training-side metrics transport layer built around localhost-only services and SSH reverse tunneling.

Quick Start (uv)

# Create and sync an environment from pyproject.toml
uv sync

# Run the demo monitor
uv run vertical-demo

# Or point the standalone terminal viewer at any running endpoint
uv run vertical-tui --endpoint http://127.0.0.1:9100

Install as a library

pip install vertical
uv pip install vertical

# local editable install
uv pip install -e .

Framework extras:

pip install "vertical[pytorch]"
pip install "vertical[jax]"
pip install "vertical[flax]"
# install all framework extras
pip install "vertical[all]"

# uv equivalents
uv pip install "vertical[pytorch]"
uv pip install "vertical[jax]"
uv pip install "vertical[flax]"
uv pip install "vertical[all]"

Minimal usage

from vertical import TrainingMonitor

with TrainingMonitor(title="My Training Run") as monitor:
    for step in range(1, 101):
        monitor.log(
            step=step,
            epoch=((step - 1) // 20) + 1,
            loss=1 / step,
            learning_rate=1e-3,
            metrics={"accuracy": step / 100},
        )

Framework-first API (JAX + Flax + PyTorch)

Use vertical.init(...) to define run defaults once (for example learning_rate and epoch) and then track any per-step numeric signals such as perplexity, gradient norm, or accuracy.

import vertical
from vertical import HTTPMetricLogger

logger = HTTPMetricLogger("http://127.0.0.1:9100")
run = vertical.init(
    framework="pytorch",
    logger=logger,
    learning_rate=3e-4,
    epoch=1,
    device="cuda",  # falls back to cpu when CUDA is unavailable
)

for _ in range(100):
    # one JSON metric event per forward pass
    run.forward(
        loss=1.0,
        perplexity=20.0,
        grad_norm=0.12,
        training_info={"framework": "pytorch", "phase": "train"},
    )

Framework adapters are loaded lazily. If you set framework="jax", only JAX-specific setup code runs.

Framework integrations are split under vertical.frameworks and exposed via framework-specific wrappers.

PyTorch users can use the dedicated wrapper and module-aware helper:

import vertical

run = vertical.init(framework="pytorch", logger=logger, device="cuda")

for step, batch in enumerate(loader, start=1):
    loss = train_step(batch)
    run.pytorch.module_step(
        module=model,
        optimizer=optimizer,
        step=step,
        loss=loss,
        metrics={"accuracy": acc},
        grad_norm=grad_norm,
        training_info={"phase": "train"},
    )

JAX users can use the dedicated wrapper for forward-pass logging:

import vertical

run = vertical.init(framework="jax", logger=logger, backend="cpu")
run.jax.forward(loss=loss_value, perplexity=perplexity_value, grad_norm=grad_norm_value)

vertical.init(...) can also bootstrap the reverse tunnel directly, which is useful for Colab and hosted training providers:

import vertical

with vertical.init(
    framework="jax",
    backend="cpu",
    remote={
        "ssh_host": "your-laptop-host",
        "ssh_user": "your-user",
        "run_id": "exp-001",
    },
) as run:
    print("endpoint:", run.remote_url)
    print("token:", run.auth_token)
    print("public key:", run.remote_session.public_key_path)
    run.jax.forward(loss=1.0, perplexity=20.0)

You can also configure remote setup from env vars and keep scripts at just vertical.init(...):

export VERTICAL_SSH_HOST=your-laptop-host
export VERTICAL_SSH_USER=your-user
export VERTICAL_RUN_ID=exp-001
# optional:
export VERTICAL_AUTH_TOKEN=your-static-token

Flax users can integrate with TrainState directly:

run = vertical.init(framework="flax", logger=logger, backend="gpu")

# inside your train step loop
run.flax.train_state_step(
    state=train_state,
    loss=loss_value,
    metrics={"perplexity": ppl_value},
    grad_norm=grad_norm_value,
)

Reverse SSH Architecture

Training machine:

  • Runs a metrics server bound to 127.0.0.1 only.
  • Training loop continuously updates the current run state.
  • Starts an SSH reverse tunnel to laptop.

Laptop:

  • Reads only local forwarded endpoint at http://127.0.0.1:PORT/metrics.
  • Never connects directly to the training machine.

Tunnel command shape:

ssh -N -R 127.0.0.1:PORT:127.0.0.1:METRICS_PORT you@your-laptop

vertical enforces this model with:

  • Local-only binding (127.0.0.1) for metrics server and reverse bind host.
  • SSH keepalive options.
  • Key-based auth defaults (BatchMode=yes, PasswordAuthentication=no).
  • Auto-reconnect supervisor if tunnel drops.
  • Deterministic run_id -> remote_port mapping when remote_port is omitted.
  • Optional endpoint auth token (Authorization: Bearer ...) for tunnel consumers.

SSH Key Setup (Required)

vertical.init(..., remote=...) now auto-generates a local keypair by default if missing:

  • private key: ~/.ssh/vertical_ed25519
  • public key: ~/.ssh/vertical_ed25519.pub

This removes the manual mkdir/chmod/ssh-keygen step from training scripts.

What still must happen once:

  • Add the generated public key to your laptop/terminal host ~/.ssh/authorized_keys.

For Colab or any third-party training machine, you can do that setup like this:

  1. Generate a dedicated keypair on the training machine:
mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/vertical_ed25519 -N ""
cat ~/.ssh/vertical_ed25519.pub
  1. On your laptop/terminal host, append that public key to ~/.ssh/authorized_keys:
mkdir -p ~/.ssh
chmod 700 ~/.ssh
echo "<PASTE_PUBLIC_KEY_FROM_TRAINING_MACHINE>" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
  1. Confirm your laptop has SSH server enabled and reachable from the training machine.

  2. In the training environment, provide:

  • ssh_host (required)
  • ssh_user (optional)
  • identity_file (path to private key, for example ~/.ssh/vertical_ed25519)
  • optional ssh_port if your laptop SSH server is not on 22

Example env configuration:

export VERTICAL_SSH_HOST=<your-laptop-host-or-ip>
export VERTICAL_SSH_USER=<your-laptop-user>
export VERTICAL_SSH_IDENTITY_FILE=~/.ssh/vertical_ed25519
export VERTICAL_SSH_PORT=22

Auth token behavior:

  • If you do not set VERTICAL_AUTH_TOKEN, vertical generates a secure token automatically.
  • Use that same token when querying metrics (curl or vertical-tui --token ...).
  • You can disable automatic local key generation with VERTICAL_AUTO_SSH_KEYGEN=false.

Reverse Tunnel Usage

import vertical

with vertical.init(
    framework="pytorch",
    remote={"ssh_host": "your-terminal-host", "ssh_user": "your-user", "run_id": "exp-001"},
) as run:
    run.forward(loss=0.5, accuracy=0.8)

This creates:

  • Training-side local metrics server on 127.0.0.1:METRICS_PORT.
  • Reverse tunnel exposing that service on laptop 127.0.0.1:PORT.

From your terminal host, read metrics at:

curl -H "Authorization: Bearer $VERTICAL_AUTH_TOKEN" http://127.0.0.1:PORT/metrics
curl -H "Authorization: Bearer $VERTICAL_AUTH_TOKEN" http://127.0.0.1:PORT/metrics/history?limit=20

Notebook example:

  • examples/vertical_remote_tunnel_colab.ipynb

Framework compatibility scripts

Small training scripts for PyTorch, TensorFlow, and JAX live in:

  • tests/framework_scripts/train_pytorch_linear.py
  • tests/framework_scripts/train_pytorch_classifier.py
  • tests/framework_scripts/train_tensorflow_linear.py
  • tests/framework_scripts/train_jax_linear.py

Development

uv run pytest
uv run ruff check .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vertical-0.1.0a2026021701.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vertical-0.1.0a2026021701-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file vertical-0.1.0a2026021701.tar.gz.

File metadata

  • Download URL: vertical-0.1.0a2026021701.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for vertical-0.1.0a2026021701.tar.gz
Algorithm Hash digest
SHA256 1235fae4b5a7ac9a489b9eec1833956976ae5e3af050ee0106c7b7b2fa2b6b23
MD5 c7348066684f1e0f902bc1e620e28f1f
BLAKE2b-256 ab1c70b6d27ccda8427b61c113ca8f06cbe21f6223e8f5a47e9f8de2f369e81b

See more details on using hashes here.

File details

Details for the file vertical-0.1.0a2026021701-py3-none-any.whl.

File metadata

File hashes

Hashes for vertical-0.1.0a2026021701-py3-none-any.whl
Algorithm Hash digest
SHA256 1329cd6cd756fa0a24ed8ea01cbef220534f1b257c5e5d1cebaec8c46780ccaa
MD5 a7a26fefe720e47f49d7a89a1233e531
BLAKE2b-256 6c4d25f0ad6c582f86320952c0dfe46b4d07f95e81021376edd74c8b3bb39fb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page