Skip to main content

Record training runs locally and push to Aquin for post-hoc inspection — loss curves, SAE diffs, model diffs, and more. Browse and download public SAEs via aquin list saes / aquin pull sae.

Project description

Aquin SDK

Record your training runs locally and push them to Aquin for post-hoc inspection — loss curves, learning rate, grad norm, epoch summaries, SAE feature diffs, model behaviour diffs, and more.

Install

pip install aquin

Quickstart

import aquin

run = aquin.init(
    base_model="meta-llama/Llama-3.2-1B-Instruct",
    run_name="my-lora-run",
    config={
        "lr": 2e-4, "epochs": 3, "rank": 16, "lora_alpha": 32,
        "method": "qlora", "per_device_train_batch_size": 2,
        "gradient_accumulation_steps": 8, "dataset": "data.jsonl",
    },
)

for epoch in range(3):
    for step, batch in enumerate(dataloader):
        loss = train_step(batch)
        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0).item()
        run.log(
            step,
            loss=loss.item(),
            learning_rate=scheduler.get_last_lr()[0],
            grad_norm=grad_norm,
            epoch=epoch,
        )

run.checkpoint(model, step=step)
run.finish()

Then push:

aquin package
aquin push

Your run appears in the Aquin dashboard under CLI runs with the full inspection suite.

API

aquin.init(base_model, run_name, config)

Starts a new run. Creates aquin_run/ in the current directory.

Param Description
base_model HuggingFace model ID, e.g. "meta-llama/Llama-3.2-1B-Instruct"
run_name Display name for the run
config Dict of training hyperparameters (optional, can also pass to finish())

run.log(step, *, loss, ...)

Record metrics for one training step. Call every step inside your loop.

Param Description
step Global training step (required)
loss Scalar training loss (required)
learning_rate Current LR — enables LR chart
grad_norm Gradient norm — enables grad norm chart
epoch Current epoch — enables epoch summary table
momentum_norm Optimizer momentum norm — enables momentum chart
step_ms Wall-clock time for this step in ms

run.checkpoint(model, step)

Saves the model checkpoint locally. One checkpoint per run — always replaces the previous save. Call once at the end of training. The checkpoint is included in the push and used for SAE diff and model diff analysis.

run.finish(config)

Flushes all metrics to disk. Pass config here if you didn't pass it to aquin.init().

CLI

aquin login       # save your API key
aquin package     # bundle aquin_run/ into aquin_run.tar.gz
aquin push        # push to Aquin
aquin whoami      # check which account you're logged in as

Using with HuggingFace Trainer / TRL

Use a TrainerCallback to hook into the training loop:

import time
from transformers import TrainerCallback

class AquinCallback(TrainerCallback):
    def __init__(self, run):
        self.run = run
        self._step_start = 0.0

    def on_step_begin(self, args, state, control, **kwargs):
        self._step_start = time.time()

    def on_log(self, args, state, control, logs=None, **kwargs):
        if not logs or "loss" not in logs:
            return
        self.run.log(
            step=state.global_step,
            loss=float(logs["loss"]),
            learning_rate=float(logs["learning_rate"]) if "learning_rate" in logs else None,
            grad_norm=float(logs["grad_norm"]) if "grad_norm" in logs else None,
            epoch=int(state.epoch) if state.epoch is not None else None,
            step_ms=round((time.time() - self._step_start) * 1000),
        )

    def on_train_end(self, args, state, control, **kwargs):
        model = kwargs.get("model")
        if model:
            self.run.checkpoint(model, step=state.global_step)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquin-0.0.5.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aquin-0.0.5-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file aquin-0.0.5.tar.gz.

File metadata

  • Download URL: aquin-0.0.5.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.0.5.tar.gz
Algorithm Hash digest
SHA256 e876857c77a08289d1a6f964a6fe4ef3c144cb5fe0d209b8b686dd38ccb53f51
MD5 6b659e3ab22d77cf13956af2e9887cb9
BLAKE2b-256 4808c98c97cc514d84446aa9299bd0ef5bbae8041344f9e52a1d23c30c058cf5

See more details on using hashes here.

File details

Details for the file aquin-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: aquin-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d61fae468c6b8e0d2f5e4c52176219aa11fdf857cde0544a0e8e256f43574b6c
MD5 f1c710a3aed949e6c8d25c352e353576
BLAKE2b-256 9526c2ac30b37f6e753fe1acb3e86a0e7bc20d04bdd25d993b2558b65b2ccc61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page