Record training runs locally and push to Aquin for post-hoc inspection — loss curves, SAE diffs, model diffs, and more.
Project description
Aquin SDK
Record your training runs locally and push them to Aquin for post-hoc inspection — loss curves, learning rate, grad norm, epoch summaries, SAE feature diffs, model behaviour diffs, and more.
Install
pip install aquin
Quickstart
import aquin
run = aquin.init(
base_model="meta-llama/Llama-3.2-1B-Instruct",
run_name="my-lora-run",
config={
"lr": 2e-4, "epochs": 3, "rank": 16, "lora_alpha": 32,
"method": "qlora", "per_device_train_batch_size": 2,
"gradient_accumulation_steps": 8, "dataset": "data.jsonl",
},
)
for epoch in range(3):
for step, batch in enumerate(dataloader):
loss = train_step(batch)
grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0).item()
run.log(
step,
loss=loss.item(),
learning_rate=scheduler.get_last_lr()[0],
grad_norm=grad_norm,
epoch=epoch,
)
run.checkpoint(model, step=step)
run.finish()
Then push:
aquin package
aquin push
Your run appears in the Aquin dashboard under CLI runs with the full inspection suite.
API
aquin.init(base_model, run_name, config)
Starts a new run. Creates aquin_run/ in the current directory.
| Param | Description |
|---|---|
base_model |
HuggingFace model ID, e.g. "meta-llama/Llama-3.2-1B-Instruct" |
run_name |
Display name for the run |
config |
Dict of training hyperparameters (optional, can also pass to finish()) |
run.log(step, *, loss, ...)
Record metrics for one training step. Call every step inside your loop.
| Param | Description |
|---|---|
step |
Global training step (required) |
loss |
Scalar training loss (required) |
learning_rate |
Current LR — enables LR chart |
grad_norm |
Gradient norm — enables grad norm chart |
epoch |
Current epoch — enables epoch summary table |
momentum_norm |
Optimizer momentum norm — enables momentum chart |
step_ms |
Wall-clock time for this step in ms |
run.checkpoint(model, step)
Saves the model checkpoint locally. One checkpoint per run — always replaces the previous save. Call once at the end of training. The checkpoint is included in the push and used for SAE diff and model diff analysis.
run.finish(config)
Flushes all metrics to disk. Pass config here if you didn't pass it to aquin.init().
CLI
aquin login # save your API key
aquin package # bundle aquin_run/ into aquin_run.tar.gz
aquin push # push to Aquin
aquin whoami # check which account you're logged in as
Using with HuggingFace Trainer / TRL
Use a TrainerCallback to hook into the training loop:
import time
from transformers import TrainerCallback
class AquinCallback(TrainerCallback):
def __init__(self, run):
self.run = run
self._step_start = 0.0
def on_step_begin(self, args, state, control, **kwargs):
self._step_start = time.time()
def on_log(self, args, state, control, logs=None, **kwargs):
if not logs or "loss" not in logs:
return
self.run.log(
step=state.global_step,
loss=float(logs["loss"]),
learning_rate=float(logs["learning_rate"]) if "learning_rate" in logs else None,
grad_norm=float(logs["grad_norm"]) if "grad_norm" in logs else None,
epoch=int(state.epoch) if state.epoch is not None else None,
step_ms=round((time.time() - self._step_start) * 1000),
)
def on_train_end(self, args, state, control, **kwargs):
model = kwargs.get("model")
if model:
self.run.checkpoint(model, step=state.global_step)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aquin-0.0.1.tar.gz.
File metadata
- Download URL: aquin-0.0.1.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69b5a255e58353ee1508343bbbc54654aa43882586372599751b5e59f9a63422
|
|
| MD5 |
55a090f3dea85e9e57044343f3842675
|
|
| BLAKE2b-256 |
55a51c0dd806992d7de0f6b818b4783c38dc3800657cfed8dfc4bbb1da2c8b93
|
File details
Details for the file aquin-0.0.1-py3-none-any.whl.
File metadata
- Download URL: aquin-0.0.1-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdbe0a9f54de3adbd01d58ade55721ba0267d5f10f3ced8251fa4a3349a1c91d
|
|
| MD5 |
62932ec52d5a1f299a883ad3193be280
|
|
| BLAKE2b-256 |
7425744ce878077e89700a40c8fe18af53f6cc79f6ecaad9107d28daef2c969c
|