SemFire (Semantic Firewall): detect advanced AI deception, including in-context scheming and multi-turn manipulative attacks.
Project description
SemFire
AI Deception Detection Toolkit
SemFire (Semantic Firewall) is an open-source toolkit for detecting advanced AI deception, with a primary focus on "in-context scheming" and multi-turn manipulative attacks. This project aims to develop tools to identify and mitigate vulnerabilities like the "Echo Chamber" and "Crescendo" attacks, where AI models are subtly guided towards undesirable behavior through conversational context.
Project Vision: A Toolkit for AI Deception Detection
SemFire aims to be a versatile, open-source toolkit providing:
- A Python library for direct integration into applications and research.
- A Command Line Interface (CLI) for quick analysis and scripting.
- A REST API service (via FastAPI) for broader accessibility and enterprise use cases.
- Core components that can be integrated into broader semantic-firewall-like systems to monitor and analyze AI interactions in real-time.
Features
- Rule-based detector (
EchoChamberDetector) for identifying cues related to "in-context scheming," context poisoning, semantic steering, and other multi-turn manipulative attack patterns (e.g., "Echo Chamber", "Crescendo"). - Crescendo escalation detector (
CrescendoEscalationDetector) focused on multi‑turn jailbreak escalation; heuristic by default with optional ML. - Analyzes both current text input and conversation history to detect evolving deceptive narratives.
- Heuristic-based detector (
HeuristicDetector) for signals like text complexity and keyword usage. - ML-based classifiers to enhance detection of complex scheming behaviors over extended dialogues (Future Work).
- Free API Image
- Enterprise API in Alpha
Installation
The project can be installed from PyPI:
pip install semfire
- Quickstart :/docs/quickstart.md
- Containerized CLI : /docs/docker-cli.md
- Usage : /docs/usage.md
- LLM Providers for ai-as-judge features : /docs/providers.md
🆕 ATT&CK v18 Navigator Integration (NEW!)
SemFire now supports MITRE ATT&CK v18 with Detection Strategies.
- Detection Strategies: 3 behavior-focused approaches
- Analytics: 8 platform-specific detections with tunable parameters
- Log Sources: 8 LLM-specific sources (v18 naming)
- Custom Techniques: T1656–T1659 for LLM attacks
Quick Start
from integrations.navigator.semfire_navigator_v18_adapter import SemFireNavigatorV18Adapter
from datetime import datetime
adapter = SemFireNavigatorV18Adapter()
# ... run SemFire analysis ...
adapter.add_detection(result)
# NOTE: To avoid overwriting files, consider using a timestamp in the filename.
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f"output_{timestamp}.json"
adapter.save_layer(output_filename)
Terminal Demos (GIFs)
The following terminal demo GIFs are available under assets/demos/asciinema/:
Quick Start
Individual Detectors
Python API
Complete Workflow
API: Health/Ready/Zip Analyze
Progressive Scores (Benign → Adversarial)
CLI Adversarial Conversation (Multi-turn)
Live Streamlit Demo
Explore the interactive Streamlit UI for SemFire:
Notes:
- The Streamlit UI lives in the companion repository under
demos/streamlit/and uses this backend. - For local development, run
streamlit run demos/streamlit/app.pyfrom the companion repo after installing this package.
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semfire-0.5.0.tar.gz.
File metadata
- Download URL: semfire-0.5.0.tar.gz
- Upload date:
- Size: 14.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1736cfcadc7f3b6d1e368e8001686a61bbdf7ac6f8dd07e6f9d9974ae8011c0b
|
|
| MD5 |
e2576cc43dfd363b58a40e810b8c1e74
|
|
| BLAKE2b-256 |
080363c010730bf6fa00b1e17c9900227f8e224fab562e00e81a5dc57343473b
|
Provenance
The following attestation bundles were made for semfire-0.5.0.tar.gz:
Publisher:
release.yml on Hyperceptron/SemFire
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semfire-0.5.0.tar.gz -
Subject digest:
1736cfcadc7f3b6d1e368e8001686a61bbdf7ac6f8dd07e6f9d9974ae8011c0b - Sigstore transparency entry: 676843230
- Sigstore integration time:
-
Permalink:
Hyperceptron/SemFire@d042c8769b84119d811cad8dae05916b5ed0a4eb -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/Hyperceptron
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d042c8769b84119d811cad8dae05916b5ed0a4eb -
Trigger Event:
push
-
Statement type:
File details
Details for the file semfire-0.5.0-py3-none-any.whl.
File metadata
- Download URL: semfire-0.5.0-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7de9d2661a606515ea8dd53b91fda66e20de65a25f9b8bc0470523b2be2cad
|
|
| MD5 |
a838ec6c959cab97b3cb82347c954709
|
|
| BLAKE2b-256 |
ee20c89bd3ca1870cbb5f2fae2e67fcb0c23f00161b5f0a7ade1884f445409b1
|
Provenance
The following attestation bundles were made for semfire-0.5.0-py3-none-any.whl:
Publisher:
release.yml on Hyperceptron/SemFire
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
semfire-0.5.0-py3-none-any.whl -
Subject digest:
6d7de9d2661a606515ea8dd53b91fda66e20de65a25f9b8bc0470523b2be2cad - Sigstore transparency entry: 676843235
- Sigstore integration time:
-
Permalink:
Hyperceptron/SemFire@d042c8769b84119d811cad8dae05916b5ed0a4eb -
Branch / Tag:
refs/tags/v0.5.0 - Owner: https://github.com/Hyperceptron
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d042c8769b84119d811cad8dae05916b5ed0a4eb -
Trigger Event:
push
-
Statement type: