Multi-modal AI safety evaluation framework for red-teaming generative models across text, image, video, and audio

These details have not been verified by PyPI

Project description

MMSAFE-Bench

Multi-Modal AI Safety Evaluation Framework for red-teaming and benchmarking generative AI models across text, image, video, and audio from a single CLI.

Prompt Datasets (JSONL) → Attack Strategies → Model Providers → Safety Judges → Metrics → Reports

Why MMSAFE-Bench?

Existing safety benchmarks are fragmented: MM-SafetyBench covers image-text, USB covers text-only, Video-SafetyBench handles video in isolation. None work as a production CLI tool with both proprietary and open-source model support.

MMSAFE-Bench unifies safety evaluation across all four generative modalities with:

20 hazard categories — MLCommons AILuminate S1-S12 + 8 multi-modal extensions (deepfakes, voice impersonation, cross-modal bypass, etc.)
9 attack strategies — jailbreaks, encoding tricks, role-play, multi-turn escalation, adversarial suffixes, cross-modal injection, low-resource translation
8 model providers — OpenAI, Anthropic, Google, Replicate, ElevenLabs, local vLLM, local Ollama, deterministic stub
6 safety judges — keyword, LLM-as-judge, toxicity, NSFW classifier, composite ensemble, human evaluation export
Edge simulation — test safety degradation on constrained hardware (DGX Spark, Jetson, Raspberry Pi, V100)
Interactive reports — HTML dashboards with Plotly charts, Markdown tables, JSON exports, model leaderboards

Quick Start

# Install with all dependencies
uv sync --extra dev --extra viz --extra providers

# Browse the safety taxonomy
mmsafe taxonomy

# Validate a dataset
mmsafe validate --dataset datasets/text/mlcommons_hazards.jsonl

# Dry-run an evaluation
mmsafe run --config mmsafe/config/defaults/text_eval.yaml --dry-run

# List available providers and attack strategies
mmsafe providers
mmsafe attacks

Architecture

mmsafe/
├── config/          # Pydantic config models + YAML defaults
├── taxonomy/        # MLCommons S1-S12 + custom X1-X8 hazard categories
├── datasets/        # JSONL prompt dataset loader + validation
├── attacks/         # 9 red-teaming strategies (passthrough → composite)
├── providers/       # 8 model provider adapters (cloud + local)
├── judges/          # 6 safety evaluation judges
├── pipeline/        # Async evaluation runner + checkpointing
├── metrics/         # ASR, RR, FRR, NSFW rate + bootstrap CI
├── reporting/       # HTML/JSON/Markdown reports + Plotly charts + leaderboard
├── edge/            # Edge deployment simulation (5 device profiles)
└── cli.py           # Click CLI with 8 commands

CLI Commands

Command	Description
`mmsafe run`	Run a safety evaluation from YAML config
`mmsafe report`	Generate HTML/JSON/Markdown report from results
`mmsafe compare`	Compare multiple evaluation runs side-by-side
`mmsafe leaderboard`	Generate model safety leaderboard
`mmsafe providers`	List available model providers
`mmsafe attacks`	List attack strategies
`mmsafe taxonomy`	Display the safety hazard taxonomy
`mmsafe validate`	Validate a JSONL prompt dataset

GPU-Aware Execution Profiles

The runner supports execution.profile with safe fallback behavior:

auto: detect available GPUs and fall back to small_gpu when A100 is unavailable
small_gpu: conservative concurrency for smaller GPUs
a100: A100-optimized concurrency

mmsafe run --config mmsafe/config/defaults/full_eval.yaml --execution-profile auto
mmsafe run --config mmsafe/config/defaults/full_eval.yaml --no-auto-tune

Configuration

Evaluations are configured via YAML files. See mmsafe/config/defaults/ for examples:

text_eval.yaml — Text-only evaluation with GPT-4o
image_eval.yaml — Image generation safety (DALL-E, FLUX)
video_eval.yaml — Video generation (Kling, Veo)
audio_eval.yaml — Audio/speech safety (ElevenLabs, Bark)
full_eval.yaml — All modalities combined
edge_simulation.yaml — Edge deployment constraints

Metrics

Metric	Description
ASR	Attack Success Rate — fraction of attacks producing unsafe content
RR	Refusal Rate — fraction of prompts refused by the model
FRR	False Refusal Rate — fraction of benign prompts incorrectly refused
NSFW Rate	Fraction of image/video outputs flagged as NSFW

All metrics include 95% bootstrap confidence intervals, broken down by category, attack, modality, and model.

Local Provider Setup

# vLLM backend
export VLLM_BASE_URL="http://localhost:8000"

# Ollama backend
export OLLAMA_BASE_URL="http://localhost:11434"

Cloud providers are optional; unavailable providers are skipped by default unless execution.strict_provider_init: true.

A100 Automation

For production-style orchestration (LowResource priority, MMSAFE auto handoff, Telegram alerts, systemd services), use:

docs/A100_AUTOMATION_RUNBOOK.md

Development

make install       # Install with dev deps
make test          # Run tests (80% coverage gate)
make lint          # Ruff + mypy
make fmt           # Auto-format
make eval-smoke    # Smoke test with stub provider
make clean         # Remove build artifacts

Docker

docker build -t mmsafe .
docker run --rm mmsafe --help
docker run --rm mmsafe taxonomy

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.2

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmsafe_bench-1.0.2.tar.gz (111.8 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mmsafe_bench-1.0.2-py3-none-any.whl (105.8 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file mmsafe_bench-1.0.2.tar.gz.

File metadata

Download URL: mmsafe_bench-1.0.2.tar.gz
Upload date: Jun 26, 2026
Size: 111.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mmsafe_bench-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7168dcd8d3a7de2de2fe680976255c2d7ec6fce951831a510a15028a4173fbb2`
MD5	`b6be395c92f1700a87c9fe21b727fabd`
BLAKE2b-256	`18adef38cbfb8e8daa914979a6c7f82872df21923edb9d001a3798577709d1f3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mmsafe_bench-1.0.2.tar.gz:

Publisher: publish-pypi.yml on ogulcanaydogan/MMSAFE-Bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mmsafe_bench-1.0.2.tar.gz
- Subject digest: 7168dcd8d3a7de2de2fe680976255c2d7ec6fce951831a510a15028a4173fbb2
- Sigstore transparency entry: 1968927072
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: ogulcanaydogan/MMSAFE-Bench@dd13a5692a3e87c203767f33f176ebddad32255a
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/ogulcanaydogan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@dd13a5692a3e87c203767f33f176ebddad32255a
- Trigger Event: release

File details

Details for the file mmsafe_bench-1.0.2-py3-none-any.whl.

File metadata

Download URL: mmsafe_bench-1.0.2-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 105.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mmsafe_bench-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df877c14a5f78c635f33f3af9e9b5b39bb97b6afd761a10174968164b9eb20c3`
MD5	`ebd4e5365ef4133248f0b699c0c3847f`
BLAKE2b-256	`577c8379d391642382fbee51b95b756e72deeaf0538a1d7f12ceddf00e45656c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mmsafe_bench-1.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on ogulcanaydogan/MMSAFE-Bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mmsafe_bench-1.0.2-py3-none-any.whl
- Subject digest: df877c14a5f78c635f33f3af9e9b5b39bb97b6afd761a10174968164b9eb20c3
- Sigstore transparency entry: 1968927252
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: ogulcanaydogan/MMSAFE-Bench@dd13a5692a3e87c203767f33f176ebddad32255a
- Branch / Tag: refs/tags/v1.0.2
- Owner: https://github.com/ogulcanaydogan
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@dd13a5692a3e87c203767f33f176ebddad32255a
- Trigger Event: release

mmsafe-bench 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MMSAFE-Bench

Why MMSAFE-Bench?

Quick Start

Architecture

CLI Commands

GPU-Aware Execution Profiles

Configuration

Metrics

Local Provider Setup

A100 Automation

Development

Docker

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance