Community-driven AI security audit tool using interpretability techniques

These details have not been verified by PyPI

Project description

🛡️ Community AI Audit

Enterprise-Grade AI Security Auditing · Open Source · Community-Driven

Community AI Audit is a unified security auditing platform for AI/ML models. It provides vulnerability scanning, red team attack simulations, mechanistic interpretability analysis, alignment auditing, and a unified 7-dimension scoring engine — all from a single CLI.

🔍 What You Can Do

Use Case	What It Solves
🛡️ Vulnerability Scanning	Detect adversarial susceptibility, backdoors, prompt injection, data extraction, toxicity, watermark detectability
⚔️ Red Team Testing	Simulate jailbreak, multi-turn, obfuscation, roleplay, and tool exploitation attacks
🧠 Mechanistic Interpretability	Probe representations, attention patterns, feature attribution, and layer behavior
🎯 Alignment Auditing	Measure sycophancy, preference drift, value alignment, and objective robustness
📊 Unified Scoring	Aggregate 7 security dimensions into a single risk score with configurable weights
📈 Trend Tracking	Monitor score evolution across time and detect regressions
📡 SIEM Integration	Push findings to Splunk, Elastic, Datadog, Sentinel, and 9+ other platforms

⚡ Quickstart

pip install community-ai-audit

# Discover available plugins
community-ai-audit discover

# Scan a model
community-ai-audit scan distilgpt2 --provider huggingface --profile quick

# Full audit with SIEM push
community-ai-audit audit meta-llama/Llama-3-8B-Instruct \
  --provider huggingface --profile standard \
  --connectors splunk elastic

# Red team attack simulation
community-ai-audit redteam gpt-4 --provider openai

# Alignment auditing
community-ai-audit alignment claude-3-opus --provider anthropic

# Compute unified 7-dimension score
community-ai-audit audit-score \
  --scan scan_results.json \
  --redteam redteam_results.json \
  --alignment alignment_results.json

🧩 Capabilities

Model Support — 9 Adapters

Provider	Adapter	Auto-Detect
HuggingFace	`huggingface`	`/` or `llama*`
OpenAI	`openai`	`gpt-`, `o1`, `o3*`
Anthropic	`anthropic`	`claude-*`
AWS Bedrock	`aws_bedrock`	—
Local (PyTorch/TF/ONNX)	`local`	Path/URI/`.pt`/`.onnx`
Ollama	`ollama`	`name:tag` (no `/`)
Replicate	`replicate`	—
VertexAI	`vertexai`	—
Groq	`groq`	—

Security Scanning — 7 Scanners

Scanner	What It Detects	Technique
`adversarial`	FGSM/PGD perturbation susceptibility	Gradient-based attacks
`backdoor`	Triggered malicious behavior	Activation clustering
`prompt_injection`	Injection vulnerabilities	Heuristic pattern matching
`data_extraction`	Training data / secret extraction	Response entropy analysis
`toxicity`	Toxic / biased outputs	Keyword + classifier scoring
`watermark`	Watermark detectability	Statistical pattern analysis
`dsl`	User-defined rules	YAML rule engine

Red Team — 5 Attack Scanners

Scanner	Attack Surface	Evaluation
`jailbreak`	20 known jailbreak prompts	Refusal vs success pattern matching
`multi_turn_attack`	10 two-turn conversation attacks	Suspicious-keyword breach detection
`prompt_obfuscation`	10 obfuscated variants (base64, leetspeak)	Harmful-keyword matching
`roleplay_attack`	15 roleplay scenarios (DAN, character shells)	Refusal vs engagement patterns
`tool_exploitation`	10 tool-misuse prompts	Exploit-keyword detection

Mechanistic Interpretability — 5 Analyzers

Analyzer	Probes	What It Measures
`activation_probes`	5 probe inputs	Response quality, SNR estimate
`representation_analysis`	8 probes, 4 pairs	Jaccard differentiation, vocabulary size
`attention_head_analysis`	5 syntactic probes	Attention complexity estimate
`feature_attribution`	5 sentiment inputs	Word-level importance, sentiment match
`layer_analysis`	3 open-ended probes	Depth estimation, complexity distribution

Alignment Auditing — 4 Scanners

Scanner	Prompts	What It Detects
`sycophancy`	5 agree + 5 disagree	Stance-sycophancy (rubber-stamping)
`preference_drift`	5 cores × 3 variants	Sentiment inconsistency across paraphrases
`value_alignment`	8 probes across 6 values	Refusal of harmful, encouragement of prosocial
`objective_robustness`	3 objectives × 4 prompts	Refusal-pattern violations per objective

Scoring — 7 Dimensions

┌─────────────────────────────────────────────┐
│           Unified Audit Score                │
├──────────────┬──────────────────────────────┤
│ Security     │   ████████████████░░ 82.0     │
│ Reliability  │   ██████████████░░░░ 72.0     │
│ Compliance   │   ██████████████████ 90.0     │
│ Agent Risk   │   ████████████████░░ 80.0     │
│ Alignment    │   ████████████████░░ 85.0     │
│ Red Team     │   ████████████░░░░░░ 60.0     │
│ Interpretability │ ████████████░░░░░░ 65.0   │
├──────────────┴──────────────────────────────┤
│ Overall: 77.6 (Good)                        │
│ Weights: security=0.2, reliability=0.1, ... │
└─────────────────────────────────────────────┘

Executive Dashboard

Real-time HTML dashboard served via dashboard_v2/server.py:

7 color-coded score cards (critical → excellent)
JSON overlay endpoint for programmatic updates
Configurable refresh interval
Responsive CSS grid layout

🔧 Installation

# Core (numpy, pyyaml, scikit-learn)
pip install community-ai-audit

# Optional extras
pip install community-ai-audit[torch]      # Torch-based scanners
pip install community-ai-audit[scheduler]  # Cron scheduling
pip install community-ai-audit[hf]         # HuggingFace transformers
pip install community-ai-audit[tf]         # TensorFlow

# Development
git clone https://github.com/anomalyco/community-ai-audit
cd community-ai-audit
pip install -e .[dev]

💻 CLI Reference

Command	Description
`scan <model> -p <provider>`	Run vulnerability scanners
`interpret <model> -p <provider>`	Run interpretability methods
`audit <model> -p <provider>`	Full pipeline: scan + interpret + report + push
`redteam <model> -p <provider>`	Red team attack simulations
`mechinterp <model> -p <provider>`	Mechanistic interpretability analysis
`alignment <model> -p <provider>`	Alignment auditing
`audit-score`	Compute unified 7-dimension score
`discover`	List all discovered plugins
`schedule add/list/remove/run`	Manage recurring audits

Exit codes: 0 = ok, 1 = HIGH/MEDIUM findings, 2 = CRITICAL findings.

📚 Documentation

Resource	Description
Architecture & Reference	Full component docs, API, CLI, config, deployment
Plugin Guide	Writing custom adapters, scanners, interpreters
Scanner Guide	Details on each vulnerability scanner
Adapter Guide	Details on each model adapter
Connector Guide	SIEM and storage connector details
Red Team	Attack framework and scanner reference
Mech Interp	Analyzer reference and methodology
Alignment	Alignment scanner reference
Scoring Engine	7-dimension scoring details
Dashboard	Executive dashboard server

📋 Configuration

cache:
  enabled: true
  max_size: 1000
  ttl_seconds: 3600

scanners:
  adversarial:
    num_samples: 32
    pgd_steps: 10
  backdoor:
    sample_size: 128

connectors:
  splunk:
    url: "${SPLUNK_URL}"
    token: "${SPLUNK_TOKEN}"
  elastic:
    url: "${ELASTIC_URL}"
    api_key: "${ELASTIC_API_KEY}"

Config values can also be set via environment variables: COMMUNITY_AI_AUDIT_CONNECTORS_SPLUNK_URL.

Precedence (lowest → highest): default.yaml → --config PATH → env vars → CLI args.

API Key Safety

COMMUNITY_AI_AUDIT_API_KEY env var (recommended)
--api-key-file PATH (reads from file, not visible in ps)
--api-key VALUE (⚠️ visible in process list)

🚀 Deployment

# Docker
docker build -t community-ai-audit .
docker run -v $(pwd)/config:/app/config community-ai-audit scan model.pt -p local

# Docker Compose
docker-compose up -d

# Helm (Kubernetes)
helm install community-ai-audit ./charts/community-ai-audit

# Air-Gapped
./scripts/airgap-bundle.sh   # On connected machine
./scripts/offline-install.sh  # On air-gapped machine

🧪 Testing

# All tests (no torch/croniter needed)
pytest tests/

# With coverage
pytest --cov=community_ai_audit tests/

508+ tests covering unit, integration, CLI, connectors, red team, mechanistic interpretability, alignment, trend tracking, and drift analysis.

🤝 Contributing

We welcome contributions! See our Plugin Guide to get started writing custom scanners, adapters, or connectors.

📄 License

MIT © Anomaly Co.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.0

Jun 25, 2026

0.5.1

Jun 25, 2026

0.5.0

Jun 7, 2026

0.2.0

Jun 7, 2026

0.1.1

Jun 4, 2026

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

community_ai_audit-0.6.0.tar.gz (188.9 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

community_ai_audit-0.6.0-py3-none-any.whl (247.3 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file community_ai_audit-0.6.0.tar.gz.

File metadata

Download URL: community_ai_audit-0.6.0.tar.gz
Upload date: Jun 25, 2026
Size: 188.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for community_ai_audit-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`6ccf3ddd4856862b4fde0fec62238d57d98955133b2fa2e87bb8c80ae643702e`
MD5	`7c0818a8a0a7ce6dda343edaed5f3f48`
BLAKE2b-256	`e0e3936bc30868c43d18303117deda76e54f639d51b1ea2586ec5fa96b0471e0`

See more details on using hashes here.

File details

Details for the file community_ai_audit-0.6.0-py3-none-any.whl.

File metadata

Download URL: community_ai_audit-0.6.0-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 247.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for community_ai_audit-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6ec9c9dbfb21a70c8d8525e382191d0d928a2ef61582c55b4b4c209adfb3cba`
MD5	`1d715efaf13c88a4ea232e14c60c70fa`
BLAKE2b-256	`90e06da2e9bf5fe168cd022a23405f00ff2ec310987d423b7c22dcd25a61ff71`

See more details on using hashes here.

community-ai-audit 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🛡️ Community AI Audit

🔍 What You Can Do

⚡ Quickstart

🧩 Capabilities

Model Support — 9 Adapters

Security Scanning — 7 Scanners

Red Team — 5 Attack Scanners

Mechanistic Interpretability — 5 Analyzers

Alignment Auditing — 4 Scanners

Scoring — 7 Dimensions

Executive Dashboard

🔧 Installation

💻 CLI Reference

📚 Documentation

📋 Configuration

API Key Safety

🚀 Deployment

🧪 Testing

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes