Skip to main content

Record training runs locally and push to Aquin for post-hoc inspection — loss curves, SAE diffs, model diffs, and more.

Project description

Aquin SDK

Record your training runs locally and push them to Aquin for post-hoc inspection — loss curves, learning rate, grad norm, epoch summaries, SAE feature diffs, model behaviour diffs, and more.

Install

pip install aquin

Quickstart

import aquin

run = aquin.init(
    base_model="meta-llama/Llama-3.2-1B-Instruct",
    run_name="my-lora-run",
    config={
        "lr": 2e-4, "epochs": 3, "rank": 16, "lora_alpha": 32,
        "method": "qlora", "per_device_train_batch_size": 2,
        "gradient_accumulation_steps": 8, "dataset": "data.jsonl",
    },
)

for epoch in range(3):
    for step, batch in enumerate(dataloader):
        loss = train_step(batch)
        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0).item()
        run.log(
            step,
            loss=loss.item(),
            learning_rate=scheduler.get_last_lr()[0],
            grad_norm=grad_norm,
            epoch=epoch,
        )

run.checkpoint(model, step=step)
run.finish()

Then push:

aquin package
aquin push

Your run appears in the Aquin dashboard under CLI runs with the full inspection suite.

API

aquin.init(base_model, run_name, config)

Starts a new run. Creates aquin_run/ in the current directory.

Param Description
base_model HuggingFace model ID, e.g. "meta-llama/Llama-3.2-1B-Instruct"
run_name Display name for the run
config Dict of training hyperparameters (optional, can also pass to finish())

run.log(step, *, loss, ...)

Record metrics for one training step. Call every step inside your loop.

Param Description
step Global training step (required)
loss Scalar training loss (required)
learning_rate Current LR — enables LR chart
grad_norm Gradient norm — enables grad norm chart
epoch Current epoch — enables epoch summary table
momentum_norm Optimizer momentum norm — enables momentum chart
step_ms Wall-clock time for this step in ms

run.checkpoint(model, step)

Saves the model checkpoint locally. One checkpoint per run — always replaces the previous save. Call once at the end of training. The checkpoint is included in the push and used for SAE diff and model diff analysis.

run.finish(config)

Flushes all metrics to disk. Pass config here if you didn't pass it to aquin.init().

CLI

aquin login       # save your API key
aquin package     # bundle aquin_run/ into aquin_run.tar.gz
aquin push        # push to Aquin
aquin whoami      # check which account you're logged in as

Using with HuggingFace Trainer / TRL

Use a TrainerCallback to hook into the training loop:

import time
from transformers import TrainerCallback

class AquinCallback(TrainerCallback):
    def __init__(self, run):
        self.run = run
        self._step_start = 0.0

    def on_step_begin(self, args, state, control, **kwargs):
        self._step_start = time.time()

    def on_log(self, args, state, control, logs=None, **kwargs):
        if not logs or "loss" not in logs:
            return
        self.run.log(
            step=state.global_step,
            loss=float(logs["loss"]),
            learning_rate=float(logs["learning_rate"]) if "learning_rate" in logs else None,
            grad_norm=float(logs["grad_norm"]) if "grad_norm" in logs else None,
            epoch=int(state.epoch) if state.epoch is not None else None,
            step_ms=round((time.time() - self._step_start) * 1000),
        )

    def on_train_end(self, args, state, control, **kwargs):
        model = kwargs.get("model")
        if model:
            self.run.checkpoint(model, step=state.global_step)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquin-0.0.2.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aquin-0.0.2-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file aquin-0.0.2.tar.gz.

File metadata

  • Download URL: aquin-0.0.2.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.0.2.tar.gz
Algorithm Hash digest
SHA256 fa45144a4d42e0dd951775d36678ab65ea6d6b8c66e4055fa89ecd16abe7dd94
MD5 673141264a1c8c59df1b42ad63ed85e6
BLAKE2b-256 6421d6dbc0c1d44b7876ac1612ddd9ec98a3ff07118cf4be994461e77a4a7504

See more details on using hashes here.

File details

Details for the file aquin-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: aquin-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8b9c82c4d9901d4de8893e47573de32967b5433bbe60782207347b066e5716d8
MD5 77b88d24756af11482317b38ecbb8424
BLAKE2b-256 87ac26b5a39725800c591619e3b519700e82d4061203374cb55ac69479b16fc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page