Skip to main content

Record training runs locally and push to Aquin for post-hoc inspection — loss curves, SAE diffs, model diffs, and more.

Project description

Aquin SDK

Record your training runs locally and push them to Aquin for post-hoc inspection — loss curves, learning rate, grad norm, epoch summaries, SAE feature diffs, model behaviour diffs, and more.

Install

pip install aquin

Quickstart

import aquin

run = aquin.init(
    base_model="meta-llama/Llama-3.2-1B-Instruct",
    run_name="my-lora-run",
    config={
        "lr": 2e-4, "epochs": 3, "rank": 16, "lora_alpha": 32,
        "method": "qlora", "per_device_train_batch_size": 2,
        "gradient_accumulation_steps": 8, "dataset": "data.jsonl",
    },
)

for epoch in range(3):
    for step, batch in enumerate(dataloader):
        loss = train_step(batch)
        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0).item()
        run.log(
            step,
            loss=loss.item(),
            learning_rate=scheduler.get_last_lr()[0],
            grad_norm=grad_norm,
            epoch=epoch,
        )

run.checkpoint(model, step=step)
run.finish()

Then push:

aquin package
aquin push

Your run appears in the Aquin dashboard under CLI runs with the full inspection suite.

API

aquin.init(base_model, run_name, config)

Starts a new run. Creates aquin_run/ in the current directory.

Param Description
base_model HuggingFace model ID, e.g. "meta-llama/Llama-3.2-1B-Instruct"
run_name Display name for the run
config Dict of training hyperparameters (optional, can also pass to finish())

run.log(step, *, loss, ...)

Record metrics for one training step. Call every step inside your loop.

Param Description
step Global training step (required)
loss Scalar training loss (required)
learning_rate Current LR — enables LR chart
grad_norm Gradient norm — enables grad norm chart
epoch Current epoch — enables epoch summary table
momentum_norm Optimizer momentum norm — enables momentum chart
step_ms Wall-clock time for this step in ms

run.checkpoint(model, step)

Saves the model checkpoint locally. One checkpoint per run — always replaces the previous save. Call once at the end of training. The checkpoint is included in the push and used for SAE diff and model diff analysis.

run.finish(config)

Flushes all metrics to disk. Pass config here if you didn't pass it to aquin.init().

CLI

aquin login       # save your API key
aquin package     # bundle aquin_run/ into aquin_run.tar.gz
aquin push        # push to Aquin
aquin whoami      # check which account you're logged in as

Using with HuggingFace Trainer / TRL

Use a TrainerCallback to hook into the training loop:

import time
from transformers import TrainerCallback

class AquinCallback(TrainerCallback):
    def __init__(self, run):
        self.run = run
        self._step_start = 0.0

    def on_step_begin(self, args, state, control, **kwargs):
        self._step_start = time.time()

    def on_log(self, args, state, control, logs=None, **kwargs):
        if not logs or "loss" not in logs:
            return
        self.run.log(
            step=state.global_step,
            loss=float(logs["loss"]),
            learning_rate=float(logs["learning_rate"]) if "learning_rate" in logs else None,
            grad_norm=float(logs["grad_norm"]) if "grad_norm" in logs else None,
            epoch=int(state.epoch) if state.epoch is not None else None,
            step_ms=round((time.time() - self._step_start) * 1000),
        )

    def on_train_end(self, args, state, control, **kwargs):
        model = kwargs.get("model")
        if model:
            self.run.checkpoint(model, step=state.global_step)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aquin-0.0.1.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aquin-0.0.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file aquin-0.0.1.tar.gz.

File metadata

  • Download URL: aquin-0.0.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.0.1.tar.gz
Algorithm Hash digest
SHA256 69b5a255e58353ee1508343bbbc54654aa43882586372599751b5e59f9a63422
MD5 55a090f3dea85e9e57044343f3842675
BLAKE2b-256 55a51c0dd806992d7de0f6b818b4783c38dc3800657cfed8dfc4bbb1da2c8b93

See more details on using hashes here.

File details

Details for the file aquin-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: aquin-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bdbe0a9f54de3adbd01d58ade55721ba0267d5f10f3ced8251fa4a3349a1c91d
MD5 62932ec52d5a1f299a883ad3193be280
BLAKE2b-256 7425744ce878077e89700a40c8fe18af53f6cc79f6ecaad9107d28daef2c969c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page