Differential testing for fine-tuned causal LMs: did LoRA/QLoRA training actually change behavior, or is the model defaulting to the pretrained base?

These details have not been verified by PyPI

Project links

Project description

sway

Differential testing for fine-tuned causal language models.

Alpha — v0.1.0 on PyPI. API is not stable; semantic versioning applies only from v1.0 onward. Feedback + issues welcome.

One question: did LoRA/QLoRA training actually change model behavior in a meaningful way, or is the model just defaulting to the pretrained base?

sway gives you a trustworthy, reproducible answer with thirteen purpose-built primitives, each z-scored against a null-adapter baseline. No LLM judges. No external APIs. Deterministic on CPU where possible.

Naming convention. The source repo and CLI entry point are both sway. The PyPI wheel is dlm-sway because the short sway name is taken on PyPI by an unrelated project. The CLI installed by pip install dlm-sway is sway — mismatched wheel/command names are a PyPA convention (see pyyaml → import yaml).

Install

# HF + PEFT backend — required for real models
pip install "dlm-sway[hf]"

# Extras composable as usual
pip install "dlm-sway[hf,style,semsim]"
pip install "dlm-sway[all]"

# .dlm auto-suite generation (requires the DLM sibling project)
pip install "dlm-sway[dlm]"

Available extras:

[hf] — HuggingFace + PEFT backend (required for real models)
[mlx] — Apple Silicon MLX backend (darwin-arm64 only)
[style] — stylistic fingerprint extensions (spaCy + textstat + nlpaug)
[semsim] — sentence-transformers for the revert probe
[dlm] — auto-generate suites from .dlm documents
[viz] — matplotlib plots
[all] — everything

Verify the install:

sway --version
sway doctor

Install from source

For the development HEAD (unreleased changes, contributor workflow):

git clone https://github.com/tenseleyFlow/sway.git
cd sway

uv venv --python 3.11 .venv      # or: python -m venv .venv
source .venv/bin/activate
uv pip install -e ".[hf]" --group dev

90-second smoke test

sway check path/to/adapter --base HuggingFaceTB/SmolLM2-135M-Instruct

Outputs a verdict in under a minute on CPU for small models: your adapter is 4.2σ above noise ✅ or indistinguishable from a null adapter ❌.

Full suite

# sway.yaml
version: 1
models:
  base: {kind: hf, base: "HuggingFaceTB/SmolLM2-135M-Instruct"}
  ft:   {kind: hf, base: "HuggingFaceTB/SmolLM2-135M-Instruct",
         adapter: "./runs/adapter/v0003"}
suite:
  - {name: null_baseline,       kind: null_adapter, runs: 3}
  - {name: doc_divergence,      kind: delta_kl,
     prompts: ["The key insight is", "An important rule"]}
  - {name: section_attribution, kind: section_internalization}
  - {name: no_leakage,          kind: leakage}
  - {name: ablation_shape,      kind: adapter_ablation,
     prompts: ["Tell me more about"]}

sway run sway.yaml              # full report to terminal + JSON
sway gate sway.yaml --junit     # CI-friendly; non-zero on fail

# Override the composite weights on the command line (partial overrides
# are fine — unspecified categories keep their defaults):
sway run sway.yaml --weights "attribution=0.5,adherence=0.2"

Inside sway.yaml, tuning knobs in defaults include:

seed — passed to seed_everything before any probe runs.
differential (default true) — toggle between the single-load PEFT path and a two-model load (doubled memory, rarely needed; for custom backends that can't do in-place adapter toggling).
score_weights — per-category weight overrides baked into the spec so CI runs reproduce the same score without a CLI flag.

Why it exists

Standard benchmarks (MMLU, HellaSwag) ask "how good is this model?" That's the wrong question after a targeted LoRA fine-tune on a small user-authored document. The right question is "did the adapter actually move the model toward what I wrote?" — and existing tools answer this poorly.

sway answers it directly via thirteen primitives across four categories, plus a baseline-calibration primitive:

Category	Primitives
Adherence	`delta_kl`, `adapter_revert`, `prompt_collapse`, `cluster_kl`
Attribution	`section_internalization`, `paraphrase_invariance`, `preference_flip`
Calibration	`style_fingerprint`, `calibration_drift`, `leakage`, `external_perplexity`
Ablation	`adapter_ablation` ← the signature primitive
Baseline	`null_adapter` (powers every z-score in the report)

The signature primitive. adapter_ablation scales the LoRA additive term by λ ∈ {0, 0.25, 0.5, 0.75, 1.0, 1.25} and measures the divergence curve. A healthy fine-tune shows a smooth, monotonic, non-saturated response. A degenerate one shows a step function or an overshoot-then- crash. Nobody else does this because nobody else gets this close to the adapter math.

The calibration. Every numeric probe z-scores its raw metric against a null-adapter baseline — a same-structure LoRA with random-init weights. "Your adapter's KL is 4.2σ above noise" is a far stronger claim than a fixed threshold. The null-adapter calibration requires a backend that implements NullCalibratedBackend (the HF backend does); probes that can't be calibrated (e.g., adapter_revert needs an embedder, the null proxy doesn't have one) surface (no calibration) in the report and fall back to fixed thresholds. Calibration stats are cached on disk under ~/.dlm-sway/null-stats/ keyed by backend identity.

The rank profile. null_adapter takes an optional rank_multipliers: list[float] (default [1.0]). Pass [0.5, 1.0, 2.0] and every numeric probe carries a three-point z-score curve: z=+4.2σ @ 1x / +6.8σ @ 0.5x / +2.1σ @ 2x. The shape is diagnostic:

Flat or slightly rising toward 0.5x — adapter signal is rank-stable, roughly independent of noise energy.
Sharply higher at 0.5x, lower at 2x — adapter is rank-saturated: a smaller rank would have yielded a clearer separation from noise. Consider halving r.
Low everywhere — adapter is barely above noise at any rank; the signal is real but weak.

Caveat: high z at low rank can also mean the low-rank null is pathologically quiet rather than that the adapter is strong. Read the profile as a shape, not a scalar — if all three z's move proportionally, the adapter is doing work; if they spread apart, the rank is mis-sized.

Implementation note: rank scaling is mathematically equivalent to multiplying the null noise std by sqrt(rank_scale) (LoRA's A·B output variance scales linearly with rank). The shipped backends apply that scaling rather than reshaping PEFT tensors — no model reload, no rank-specific adapter cache, same alpha/r scaling throughout.

Determinism. Every sway run calls seed_everything(spec.defaults.seed) before the first probe — seeds python/numpy/torch RNGs and asks torch for deterministic algorithms (CUBLAS_WORKSPACE_CONFIG=:4096:8). The report footer prints the achieved class — strict (CUDA), best_effort (CPU/MPS), or loose (deterministic algorithms refused). Same seed + same host = bit-identical scoring across runs.

Pytest integration

For teams already testing their training pipeline with pytest, sway ships a plugin behind the [pytest] extra. A single decorator turns one pytest function into one test item per probe plus an optional composite-score gate:

import pytest

@pytest.mark.sway(spec="sway.yaml", threshold=0.6)
def test_adapter_healthy() -> None:
    """The decorator owns the body — a bare pass is conventional."""

pytest -v then reports:

test_sway_gate.py::test_adapter_healthy::adherence    PASSED
test_sway_gate.py::test_adapter_healthy::calibration  PASSED
test_sway_gate.py::test_adapter_healthy::__gate__     PASSED

--junitxml emits one <testcase> per probe, pytest -k adherence runs just that probe, FAIL / ERROR / SKIP verdicts translate to pytest outcomes. See examples/pytest_integration/ for a full before/after walkthrough.

pip install 'dlm-sway[hf,pytest]'

Pre-commit

For teams using pre-commit.com, sway ships a .pre-commit-hooks.yaml declaring three hooks that run sway gate before every commit touching a spec, .dlm document, or adapter file. Add 4–5 lines to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/tenseleyFlow/sway
    rev: v0.1.0
    hooks:
      - id: sway-gate
        args: ["sway.yaml", "--threshold=0.6"]

Three variants ship; pick whichever fits your install posture:

Hook	When to use	First-run cost
`sway-gate`	you already ran `pip install 'dlm-sway[hf]'`	~none — uses the sway binary on your `PATH`
`sway-gate-isolated`	fresh venv, no existing sway install	~2 min + ~5 GB — pre-commit builds a fresh venv and installs sway + torch + transformers
`sway-gate-docker`	zero-install hosts with docker available	~1 min — pulls `ghcr.io/tenseleyflow/sway-gate:v0.1.0` (torch baked in, MiniLM weights pre-cached)

The recommended default is sway-gate. Switch to sway-gate-isolated if you can't rely on a host-level sway install. Reach for sway-gate-docker on ephemeral CI runners where docker is cheaper than a fresh venv.

Rev pinning

The example above pins to the v0.1.0 tag. Bump it deliberately when you want to pick up a new release; pre-commit autoupdate will surface newer tags when you run it explicitly.

Scope

The hook only gates — exits non-zero on FAIL, zero on PASS. No --json / --markdown report flags are surfaced; those belong in sway run (ad-hoc or in a separate CI job). Keeps git commit fast and the gate's verdict uncluttered.

See examples/precommit-example/ for the full walk-through including the sway.yaml template, the consumer-side .pre-commit-config.yaml, and the try-it-locally-before-you-install recipe.

The `.dlm` integration

If you trained your adapter via the DocumentLanguageModel project, sway auto-generates a test suite from your document's sections.

Install sway with the [dlm] extra alongside [hf] (pre-PyPI, editable):

# inside a clone of this repo
uv pip install -e ".[hf,dlm]"

Then:

sway autogen path/to/doc.dlm -o sway.yaml
sway run sway.yaml

Per-section attribution tells you which parts of your document actually moved the model — a kind of signal no other tool provides.

Status

Pre-alpha. API will break. Not yet on PyPI — install editable from source (see Install from source). Version 0.1.0 will be the first published tag; until then, every clone pulls the tip of main.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dlm_sway-0.1.0.tar.gz (337.9 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dlm_sway-0.1.0-py3-none-any.whl (209.2 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file dlm_sway-0.1.0.tar.gz.

File metadata

Download URL: dlm_sway-0.1.0.tar.gz
Upload date: Apr 24, 2026
Size: 337.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dlm_sway-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`56e4f6fb3b5246f473e229ba0450d720f23d6439659ad62c8130ac7545615525`
MD5	`73bc50c9245fa7e06c1d563b2bf1b1ba`
BLAKE2b-256	`a731496eaafcb484a351708707c84df881c9efa77f502c784403223c02bc8969`

See more details on using hashes here.

File details

Details for the file dlm_sway-0.1.0-py3-none-any.whl.

File metadata

Download URL: dlm_sway-0.1.0-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 209.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dlm_sway-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c2bfff184819a9e98fe52e6b42c390d62f11275a1a03ed098ad5e8fcbc3648c`
MD5	`d2cb548fef97c3212469cb57d03f94db`
BLAKE2b-256	`c2c7205fe1e9541c6adcf969ef7497088d1c9a5705b028b21ef5fd917832d6a4`

See more details on using hashes here.

dlm-sway 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sway

Install

Install from source

90-second smoke test

Full suite

Why it exists

Pytest integration

Pre-commit

Rev pinning

Scope

The `.dlm` integration

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

dlm-sway 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sway

Install

Install from source

90-second smoke test

Full suite

Why it exists

Pytest integration

Pre-commit

Rev pinning

Scope

The .dlm integration

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `.dlm` integration