LLM red-teaming and adversarial testing framework

Project description

vauban

MLX-native abliteration toolkit for Apple Silicon. Measure a refusal direction, cut it from the weights, get a modified model out. ~550 lines of Python.

Install

git clone https://github.com/teilomillet/vauban.git
cd vauban && uv sync

Usage

Write a TOML config:

[model]
path = "mlx-community/Llama-3.2-3B-Instruct-4bit"

[data]
harmful = "default"
harmless = "default"

Run it:

uv run vauban run.toml

Output lands in output/ — a complete model directory loadable by mlx_lm.load().

What it does

Measure — runs harmful/harmless prompts, captures per-layer activations, extracts the refusal direction via difference-in-means (or top-k SVD subspace)
Cut — removes the direction from o_proj and down_proj weights via rank-1 projection. Variants: norm-preserving, biprojected, subspace
Export — writes modified weights + tokenizer as a loadable model
Evaluate — refusal rate, perplexity, KL divergence between original and modified
Probe/Steer — inspect per-layer projections, steer generation at runtime
Surface map — scan diverse prompts to visualize the refusal landscape before/after

Python API

import mlx_lm
from vauban import measure, cut, export_model, load_prompts, default_prompt_paths
from mlx.utils import tree_flatten

model, tok = mlx_lm.load("mlx-community/Llama-3.2-3B-Instruct-4bit")
harmful = load_prompts(default_prompt_paths()[0])
harmless = load_prompts(default_prompt_paths()[1])

result = measure(model, tok, harmful, harmless)
weights = dict(tree_flatten(model.parameters()))
modified = cut(weights, result.direction, list(range(len(model.model.layers))))

export_model("mlx-community/Llama-3.2-3B-Instruct-4bit", modified, "output")

Config reference

See docs/getting-started.md for the full config reference with all [measure], [cut], [surface], [eval], and [output] options.

Requirements

Apple Silicon Mac (M1+)
Python >= 3.12
uv

License

Apache-2.0

Project details

Release history Release notifications | RSS feed

0.4.9

Apr 9, 2026

0.4.8

Apr 9, 2026

0.4.7

Apr 6, 2026

0.4.6

Apr 6, 2026

0.4.5

Apr 6, 2026

0.4.4

Apr 5, 2026

0.4.3

Apr 5, 2026

0.4.2

Apr 5, 2026

0.4.1

Apr 5, 2026

0.3.6

Apr 5, 2026

0.3.5

Apr 4, 2026

0.3.4

Mar 31, 2026

0.3.3

Mar 26, 2026

0.3.2

Mar 15, 2026

0.3.1

Mar 2, 2026

0.3.0

Mar 2, 2026

0.2.5

Feb 25, 2026

0.2.4

Feb 25, 2026

0.2.3

Feb 25, 2026

0.2.2

Feb 25, 2026

This version

0.2.1

Feb 24, 2026

0.2.0

Feb 24, 2026

0.1.2

Dec 4, 2025

0.1.1

Dec 4, 2025

0.1.0

Nov 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vauban-0.2.1.tar.gz (109.2 kB view details)

Uploaded Feb 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vauban-0.2.1-py3-none-any.whl (42.6 kB view details)

Uploaded Feb 24, 2026 Python 3

File details

Details for the file vauban-0.2.1.tar.gz.

File metadata

Download URL: vauban-0.2.1.tar.gz
Upload date: Feb 24, 2026
Size: 109.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.16

File hashes

Hashes for vauban-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`ae2b7f958f97262c692352f652325571d43aa7ccdaf7e1f58697f4f94ab31204`
MD5	`3b1ffdd820f0b256967da054e1dd4048`
BLAKE2b-256	`c193fa65b7e7a777791c2e6a5507f035f8dce98e84180177d6dad5869f4f38e2`

See more details on using hashes here.

File details

Details for the file vauban-0.2.1-py3-none-any.whl.

File metadata

Download URL: vauban-0.2.1-py3-none-any.whl
Upload date: Feb 24, 2026
Size: 42.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.16

File hashes

Hashes for vauban-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce678fa5adfae4dbe31f1afe241bebdd80169fc624abbf85778d07070ac51730`
MD5	`bf1972d1ac2901900691be55c189e5b5`
BLAKE2b-256	`6ffa37450203ae7e144014eb1a8297d695ff45a18ab4ba4b7cc781e7faffcd07`

See more details on using hashes here.

vauban 0.2.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

vauban

Install

Usage

What it does

Python API

Config reference

Requirements

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes