Safety regression comparison for AI systems.

These details have not been verified by PyPI

Project description

SafetyDiff 🛡️

The Git-Diff for LLM Safety Posture

SafetyDiff is an open-source continuous integration (CI/CD) and analytics engine for Large Language Models. It solves the "Black Box Versioning" problem: When you upgrade a model from version 1 to version 2 (or switch from Qwen to OpenAI), is the model actually safer, or does it just have different vulnerabilities?

Instead of relying on single benchmark scores, SafetyDiff reads evaluation databases and provides a direct, side-by-side mathematical diff of how two models respond to the exact same adversarial attacks.

Why SafetyDiff?

Current AI security benchmarks output static numbers (e.g., "Model A scored 82%"). SafetyDiff treats LLM safety like software engineering:

Regression Tracking: See exactly which vulnerabilities were fixed, and which new vulnerabilities were introduced.
Cross-Model Transferability: Take an attack that broke Llama-3 and instantly diff it against Qwen2.5 to map shared architectural flaws.
Granular Taxonomy: Breaks down safety by Intent (e.g., role_hijack, data_exfiltration, tool_abuse).

Installation

git clone https://github.com/m4vic/SafetyDiff.git
cd SafetyDiff
pip install -r requirements.txt

Quick Start (Demo)

SafetyDiff ships with a demo_safety_history.db containing thousands of pre-computed red-team evaluations across qwen2.5-coder:3b, qwen3.5:4b, and gpt-4o-mini. You can run comparisons out of the box without generating your own data!

Compare two models:

python safetydiff.py --compare gpt-4o-mini qwen2.5-coder:3b

Filter by a specific vulnerability category:

python safetydiff.py --compare gpt-4o-mini qwen2.5-coder:3b --intent role_hijack

Architecture & Data Generation

SafetyDiff is an Analytics Engine. It does not generate attacks itself. It is designed to consume SQLite databases generated by automated red-teaming pipelines. The demo database provided was generated using ASRT (Automated Safety Regression Testing), a proprietary zero-human adversarial generation engine utilizing TF-IDF routers and MoE (Mixture-of-Experts) LLM-as-a-Judge evaluations.

Roadmap

v1.0 (Current): Direct Prompt Injection & Chat Vulnerability Diffing.
v2.0 (In Development): Agentic Trajectory Evaluation & Indirect Prompt Injections (IPI).

Author: Sanskar Jajoo (@m4vic)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safetydiff-1.0.0.tar.gz (11.7 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

safetydiff-1.0.0-py3-none-any.whl (12.8 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file safetydiff-1.0.0.tar.gz.

File metadata

Download URL: safetydiff-1.0.0.tar.gz
Upload date: Jul 1, 2026
Size: 11.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for safetydiff-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f136ab58526c64c01e58b9ca7272d018f841790912f1b38ce75b16890a8afcc5`
MD5	`98b23ee600c672870bb3d217df591379`
BLAKE2b-256	`59aaefa6662fab80fd8c66eb7db8e8dc2fd32db6f02871bbfc94079f2bd31f0f`

See more details on using hashes here.

File details

Details for the file safetydiff-1.0.0-py3-none-any.whl.

File metadata

Download URL: safetydiff-1.0.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 12.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for safetydiff-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc804ecdd2d3b52ddbcc4b0bad41356538047a04c664ff4fbcb62915424624b8`
MD5	`b514200aa4a663407e47142561421ea2`
BLAKE2b-256	`6502c80e94b8e72f874603d1576b0fb7b16942edf7101034ec5cacbfb0b2fe04`

See more details on using hashes here.

safetydiff 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

SafetyDiff 🛡️

Why SafetyDiff?

Installation

Quick Start (Demo)

Architecture & Data Generation

Roadmap

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes