Skip to main content

Aquin CLI. Run GPU inspection, steering, simulation, and evals locally with aquin connect, aquin load, aquin chat, and aquin inspect.

Project description

Aquin SDK

Record your training runs locally and push them to Aquin for post-hoc inspection — loss curves, learning rate, grad norm, epoch summaries, SAE feature diffs, model behaviour diffs, and more.

Install

pip install aquin

Quickstart

import aquin

run = aquin.init(
    base_model="meta-llama/Llama-3.2-1B-Instruct",
    run_name="my-lora-run",
    config={
        "lr": 2e-4, "epochs": 3, "rank": 16, "lora_alpha": 32,
        "method": "qlora", "per_device_train_batch_size": 2,
        "gradient_accumulation_steps": 8, "dataset": "data.jsonl",
    },
)

for epoch in range(3):
    for step, batch in enumerate(dataloader):
        loss = train_step(batch)
        grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0).item()
        run.log(
            step,
            loss=loss.item(),
            learning_rate=scheduler.get_last_lr()[0],
            grad_norm=grad_norm,
            epoch=epoch,
        )

run.checkpoint(model, step=step)
run.finish()

Then push:

aquin package
aquin push

Your run appears in the Aquin dashboard under CLI runs with the full inspection suite.

API

aquin.init(base_model, run_name, config)

Starts a new run. Creates aquin_run/ in the current directory.

Param Description
base_model HuggingFace model ID, e.g. "meta-llama/Llama-3.2-1B-Instruct"
run_name Display name for the run
config Dict of training hyperparameters (optional, can also pass to finish())

run.log(step, *, loss, ...)

Record metrics for one training step. Call every step inside your loop.

Param Description
step Global training step (required)
loss Scalar training loss (required)
learning_rate Current LR — enables LR chart
grad_norm Gradient norm — enables grad norm chart
epoch Current epoch — enables epoch summary table
momentum_norm Optimizer momentum norm — enables momentum chart
step_ms Wall-clock time for this step in ms

run.checkpoint(model, step)

Saves the model checkpoint locally. One checkpoint per run — always replaces the previous save. Call once at the end of training. The checkpoint is included in the push and used for SAE diff and model diff analysis.

run.finish(config)

Flushes all metrics to disk. Pass config here if you didn't pass it to aquin.init().

CLI

aquin login       # save your API key
aquin package     # bundle aquin_run/ into aquin_run.tar.gz
aquin push        # push to Aquin
aquin whoami      # check which account you're logged in as

Using with HuggingFace Trainer / TRL

Use a TrainerCallback to hook into the training loop:

import time
from transformers import TrainerCallback

class AquinCallback(TrainerCallback):
    def __init__(self, run):
        self.run = run
        self._step_start = 0.0

    def on_step_begin(self, args, state, control, **kwargs):
        self._step_start = time.time()

    def on_log(self, args, state, control, logs=None, **kwargs):
        if not logs or "loss" not in logs:
            return
        self.run.log(
            step=state.global_step,
            loss=float(logs["loss"]),
            learning_rate=float(logs["learning_rate"]) if "learning_rate" in logs else None,
            grad_norm=float(logs["grad_norm"]) if "grad_norm" in logs else None,
            epoch=int(state.epoch) if state.epoch is not None else None,
            step_ms=round((time.time() - self._step_start) * 1000),
        )

    def on_train_end(self, args, state, control, **kwargs):
        model = kwargs.get("model")
        if model:
            self.run.checkpoint(model, step=state.global_step)

Building and publishing a new release

Prerequisites: Python 3.13, Nuitka, MSVC (Visual Studio Build Tools with Desktop C++ workload).

1. Compile to native extensions

cd cli
python scripts/build_nuitka.py
# Compiles engine/ + compute/ to .pyd, removes .py source, audits on finish

2. Build the wheel

python -m build --wheel
# Output: dist/aquin-<version>-py3-none-any.whl

3. Audit — confirm no source leaked

python scripts/build_nuitka.py --check
# Must print: Audit passed

4. Bump version before releasing Edit version in pyproject.toml, then repeat steps 1–3.

5. Distribute Send the wheel directly to users (pip install aquin-*.whl) or upload to R2 and share a signed link.

Notes:

  • .pyd files and dist/ are gitignored — never commit compiled artifacts
  • After building, engine/ and compute/ have no .py source locally either — keep a clean git working tree by running builds in a separate branch or restoring source from git after building
  • To rebuild from scratch: git restore cli/aquin/engine cli/aquin/compute then repeat from step 1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aquin-0.1.2-py3-none-any.whl (6.9 MB view details)

Uploaded Python 3

File details

Details for the file aquin-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: aquin-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for aquin-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 970d0fa3eff8e81287c235f9045cfe7f8d2dcfc873e699fb95b20d5fac0be56b
MD5 5e05bd4748ddc989304e2252a7f318f8
BLAKE2b-256 273d6476cab49151bbfdb55e91e5456f84cc19020630dd25ec2c42aec0b0e79b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page