Skip to main content

SemFire (Semantic Firewall): detect advanced AI deception, including in-context scheming and multi-turn manipulative attacks.

Project description

SemFire Logo

SemFire

CI

AI Deception Detection Toolkit

SemFire (Semantic Firewall) is an open-source toolkit for detecting advanced AI deception, with a primary focus on "in-context scheming" and multi-turn manipulative attacks. This project aims to develop tools to identify and mitigate vulnerabilities like the "Echo Chamber" and "Crescendo" attacks, where AI models are subtly guided towards undesirable behavior through conversational context.

Project Vision: A Toolkit for AI Deception Detection

History

SemFire aims to be a versatile, open-source toolkit providing:

  • A Python library for direct integration into applications and research.
  • A Command Line Interface (CLI) for quick analysis and scripting.
  • A REST API service (via FastAPI) for broader accessibility and enterprise use cases.
  • Core components that can be integrated into broader semantic-firewall-like systems to monitor and analyze AI interactions in real-time.

Features

  • Rule-based detector (EchoChamberDetector) for identifying cues related to "in-context scheming," context poisoning, semantic steering, and other multi-turn manipulative attack patterns (e.g., "Echo Chamber", "Crescendo").
  • Analyzes both current text input and conversation history to detect evolving deceptive narratives.
  • Heuristic-based detector (HeuristicDetector) for signals like text complexity and keyword usage.
  • ML-based classifiers to enhance detection of complex scheming behaviors over extended dialogues (Future Work).

API Instructions forthcoming.

  • Python API for programmatic access.
  • REST service (FastAPI) for network-based access.

Installation

The project can be installed from PyPI:

pip install semfire

Terminal Demos (GIFs)

Examples

The following terminal demo GIFs are available under assets/demos/asciinema/:

Quick Start

demo

Individual Detectors

demo

Python API

demo

Complete Workflow

demo

API: Health/Ready/Zip Analyze

demo

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semfire-0.2.7.tar.gz (10.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semfire-0.2.7-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file semfire-0.2.7.tar.gz.

File metadata

  • Download URL: semfire-0.2.7.tar.gz
  • Upload date:
  • Size: 10.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semfire-0.2.7.tar.gz
Algorithm Hash digest
SHA256 93e61eabf9776a368cba23b62cf0167e2129e2e8eee8a26070fdb15dbe59a48c
MD5 50f70cac74112cd778934cde9f4a6bec
BLAKE2b-256 0caee1d4608538670de83f08f4c5dbc336eafa30f83a4e32528b8d930b35a09b

See more details on using hashes here.

Provenance

The following attestation bundles were made for semfire-0.2.7.tar.gz:

Publisher: release.yml on josephedward/SemFire

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semfire-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: semfire-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semfire-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3b272c01d498864d29684adf5569d0d988682e9ea462df580a196bef7c0e0f94
MD5 0cb70e9b9c491ba2a34b11773bc0a10a
BLAKE2b-256 15812b57da23ba3364224cd63f9bc212ecdde591e367d625a3933e41a7adbc57

See more details on using hashes here.

Provenance

The following attestation bundles were made for semfire-0.2.7-py3-none-any.whl:

Publisher: release.yml on josephedward/SemFire

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page