Terminal-first live training monitor for Python ML workloads across frameworks.
Project description
vertical
vertical is a training-side metrics transport layer built around localhost-only services and SSH reverse tunneling.
Quick Start (uv)
# Create and sync an environment from pyproject.toml
uv sync
# Run the demo monitor
uv run vertical-demo
# Or point the standalone terminal viewer at any running endpoint
uv run vertical-tui --endpoint http://127.0.0.1:9100
Install as a library
pip install vertical
uv pip install vertical
# local editable install
uv pip install -e .
Framework extras:
pip install "vertical[pytorch]"
pip install "vertical[jax]"
pip install "vertical[flax]"
# install all framework extras
pip install "vertical[all]"
# uv equivalents
uv pip install "vertical[pytorch]"
uv pip install "vertical[jax]"
uv pip install "vertical[flax]"
uv pip install "vertical[all]"
Minimal usage
from vertical import TrainingMonitor
with TrainingMonitor(title="My Training Run") as monitor:
for step in range(1, 101):
monitor.log(
step=step,
epoch=((step - 1) // 20) + 1,
loss=1 / step,
learning_rate=1e-3,
metrics={"accuracy": step / 100},
)
Framework-first API (JAX + Flax + PyTorch)
Use vertical.init(...) to define run defaults once (for example learning_rate and epoch) and then track any per-step numeric signals such as perplexity, gradient norm, or accuracy.
import vertical
from vertical import HTTPMetricLogger
logger = HTTPMetricLogger("http://127.0.0.1:9100")
run = vertical.init(
framework="pytorch",
logger=logger,
learning_rate=3e-4,
epoch=1,
device="cuda", # falls back to cpu when CUDA is unavailable
)
for _ in range(100):
# one JSON metric event per forward pass
run.forward(
loss=1.0,
perplexity=20.0,
grad_norm=0.12,
training_info={"framework": "pytorch", "phase": "train"},
)
Framework adapters are loaded lazily. If you set framework="jax", only JAX-specific setup code runs.
Framework integrations are split under vertical.frameworks and exposed via framework-specific wrappers.
PyTorch users can use the dedicated wrapper and module-aware helper:
import vertical
run = vertical.init(framework="pytorch", logger=logger, device="cuda")
for step, batch in enumerate(loader, start=1):
loss = train_step(batch)
run.pytorch.module_step(
module=model,
optimizer=optimizer,
step=step,
loss=loss,
metrics={"accuracy": acc},
grad_norm=grad_norm,
training_info={"phase": "train"},
)
JAX users can use the dedicated wrapper for forward-pass logging:
import vertical
run = vertical.init(framework="jax", logger=logger, backend="cpu")
run.jax.forward(loss=loss_value, perplexity=perplexity_value, grad_norm=grad_norm_value)
vertical.init(...) can also bootstrap the reverse tunnel directly, which is useful for Colab and hosted training providers:
import vertical
with vertical.init(
framework="jax",
backend="cpu",
remote={
"ssh_host": "your-laptop-host",
"ssh_user": "your-user",
"run_id": "exp-001",
},
) as run:
print("endpoint:", run.remote_url)
print("token:", run.auth_token)
print("public key:", run.remote_session.public_key_path)
run.jax.forward(loss=1.0, perplexity=20.0)
You can also configure remote setup from env vars and keep scripts at just vertical.init(...):
export VERTICAL_SSH_HOST=your-laptop-host
export VERTICAL_SSH_USER=your-user
export VERTICAL_RUN_ID=exp-001
# optional:
export VERTICAL_AUTH_TOKEN=your-static-token
Flax users can integrate with TrainState directly:
run = vertical.init(framework="flax", logger=logger, backend="gpu")
# inside your train step loop
run.flax.train_state_step(
state=train_state,
loss=loss_value,
metrics={"perplexity": ppl_value},
grad_norm=grad_norm_value,
)
Reverse SSH Architecture
Training machine:
- Runs a metrics server bound to
127.0.0.1only. - Training loop continuously updates the current run state.
- Starts an SSH reverse tunnel to laptop.
Laptop:
- Reads only local forwarded endpoint at
http://127.0.0.1:PORT/metrics. - Never connects directly to the training machine.
Tunnel command shape:
ssh -N -R 127.0.0.1:PORT:127.0.0.1:METRICS_PORT you@your-laptop
vertical enforces this model with:
- Local-only binding (
127.0.0.1) for metrics server and reverse bind host. - SSH keepalive options.
- Key-based auth defaults (
BatchMode=yes,PasswordAuthentication=no). - Auto-reconnect supervisor if tunnel drops.
- Deterministic
run_id -> remote_portmapping whenremote_portis omitted. - Optional endpoint auth token (
Authorization: Bearer ...) for tunnel consumers.
SSH Key Setup (Required)
vertical.init(..., remote=...) now auto-generates a local keypair by default if missing:
- private key:
~/.ssh/vertical_ed25519 - public key:
~/.ssh/vertical_ed25519.pub
This removes the manual mkdir/chmod/ssh-keygen step from training scripts.
What still must happen once:
- Add the generated public key to your laptop/terminal host
~/.ssh/authorized_keys.
For Colab or any third-party training machine, you can do that setup like this:
- Generate a dedicated keypair on the training machine:
mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/vertical_ed25519 -N ""
cat ~/.ssh/vertical_ed25519.pub
- On your laptop/terminal host, append that public key to
~/.ssh/authorized_keys:
mkdir -p ~/.ssh
chmod 700 ~/.ssh
echo "<PASTE_PUBLIC_KEY_FROM_TRAINING_MACHINE>" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
-
Confirm your laptop has SSH server enabled and reachable from the training machine.
-
In the training environment, provide:
ssh_host(required)ssh_user(optional)identity_file(path to private key, for example~/.ssh/vertical_ed25519)- optional
ssh_portif your laptop SSH server is not on22
Example env configuration:
export VERTICAL_SSH_HOST=<your-laptop-host-or-ip>
export VERTICAL_SSH_USER=<your-laptop-user>
export VERTICAL_SSH_IDENTITY_FILE=~/.ssh/vertical_ed25519
export VERTICAL_SSH_PORT=22
Auth token behavior:
- If you do not set
VERTICAL_AUTH_TOKEN,verticalgenerates a secure token automatically. - Use that same token when querying metrics (
curlorvertical-tui --token ...). - You can disable automatic local key generation with
VERTICAL_AUTO_SSH_KEYGEN=false.
Reverse Tunnel Usage
import vertical
with vertical.init(
framework="pytorch",
remote={"ssh_host": "your-terminal-host", "ssh_user": "your-user", "run_id": "exp-001"},
) as run:
run.forward(loss=0.5, accuracy=0.8)
This creates:
- Training-side local metrics server on
127.0.0.1:METRICS_PORT. - Reverse tunnel exposing that service on laptop
127.0.0.1:PORT.
From your terminal host, read metrics at:
curl -H "Authorization: Bearer $VERTICAL_AUTH_TOKEN" http://127.0.0.1:PORT/metrics
curl -H "Authorization: Bearer $VERTICAL_AUTH_TOKEN" http://127.0.0.1:PORT/metrics/history?limit=20
Notebook example:
examples/vertical_remote_tunnel_colab.ipynb
Framework compatibility scripts
Small training scripts for PyTorch, TensorFlow, and JAX live in:
tests/framework_scripts/train_pytorch_linear.pytests/framework_scripts/train_pytorch_classifier.pytests/framework_scripts/train_tensorflow_linear.pytests/framework_scripts/train_jax_linear.py
Development
uv run pytest
uv run ruff check .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vertical-0.1.0a2026021701.tar.gz.
File metadata
- Download URL: vertical-0.1.0a2026021701.tar.gz
- Upload date:
- Size: 32.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1235fae4b5a7ac9a489b9eec1833956976ae5e3af050ee0106c7b7b2fa2b6b23
|
|
| MD5 |
c7348066684f1e0f902bc1e620e28f1f
|
|
| BLAKE2b-256 |
ab1c70b6d27ccda8427b61c113ca8f06cbe21f6223e8f5a47e9f8de2f369e81b
|
File details
Details for the file vertical-0.1.0a2026021701-py3-none-any.whl.
File metadata
- Download URL: vertical-0.1.0a2026021701-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1329cd6cd756fa0a24ed8ea01cbef220534f1b257c5e5d1cebaec8c46780ccaa
|
|
| MD5 |
a7a26fefe720e47f49d7a89a1233e531
|
|
| BLAKE2b-256 |
6c4d25f0ad6c582f86320952c0dfe46b4d07f95e81021376edd74c8b3bb39fb2
|