Helpful utilities for AI Sentinel toxicity detection

Project description

AI Sentinel

AI Sentinel is a Python package designed to help developers integrate toxicity analysis into their applications with ease. It provides a simple, unified interface to leverage powerful AI models for detecting and categorizing harmful content in text.

Key Features

Advanced Toxicity Detection: Comprehensive toxicity detection and classification.
Multiple LLM Providers: Designed to support various AI model providers (currently Azure OpenAI, and Gemini, with more models being supported in the future).
Structured Output: Type-safe responses with Pydantic validation.

Getting Started

Installation

ai-sentinel is avalible on PyPI

pip install ai-sentinel

Usage

AI Sentinel is designed to be straightforward to use. You'll primarily interact with the ToxicityGuard and a client specific to your chosen AI model (e.g., AzureOpenAIClient).

from ai_sentinel import AzureOpenAIClient, ToxicityGuard

# Initialize LLM client
client = AzureOpenAIClient(
    api_key="your-api-key",
    model="gpt-4o-mini",
    api_version="2024-02-01",
    azure_endpoint="https://your-resource.openai.azure.com/"
)

# Create toxicity guard
guard = ToxicityGuard(client)

# Analyze text
result = guard.analyze("This is a normal message")

print(f"Is toxic: {result.is_toxic}")
print(f"Confidence: {result.confidence}")
print(f"Categories: {result.categories}")
print(f"Reason: {result.reason}")
print(f"Severity: {result.score}")

Async Usage

Simply call the async analyze function analyze_async and use await with each API call:

import asyncio
from ai_sentinel import AzureOpenAIClient, ToxicityGuard

client = AzureOpenAIClient(
    api_key="your-api-key",
    model="gpt-4o-mini",
    api_version="2024-02-01",
    azure_endpoint="https://your-resource.openai.azure.com/"
)

async def main() -> None:
    guard = ToxicityGuard(client)
    response = await guard.analyze_async("Text to analyze")

    print(f"Is toxic: {result.is_toxic}")
    print(f"Confidence: {result.confidence}")
    print(f"Categories: {result.categories}")
    print(f"Reason: {result.reason}")
    print(f"Severity: {result.score}")

asyncio.run(main())

Supported LLM API Services

ai-sentinel is model agnostic, with support for the following LLM API services:

Provider	Models	Details
Azure OpenAI	GPT-4, GPT-4o, GPT-3.5-turbo	Industry-leading models
Google Gemini	Gemini 2.0 Flash, Gemini 2.5 Pro	Latest Google technology
Open Source LLMs	Qwen3, GPT-oss, Meta Llama	Uses OpenAI Compatible Server's, so check if the server hosting the LLM implements the OpenAI API specifications for endpoints
Anthropic	To Be Implemented	Will be implemented in the future

Gemini Usage

from ai_sentinel import GeminiClient, ToxicityGuard

# Initialize Gemini client
client = GeminiClient(
    api_key="your-gemini-api-key",
    model="gemini-2.0-flash"
)

# Create and use toxicity guard
guard = ToxicityGuard(client)
result = guard.analyze("Your text here")

Open Source using OpenAI Usage

from ai_sentinel import OpenAIClient, ToxicityGuard

# Initialize an Open Source LLM Client
client = OpenAIClient(
    base_url="http://localhost:8000/v1",
    model="Qwen/Qwen3-8B"
)

# Create and use toxicity guard
guard = ToxicityGuard(client)
result = guard.analyze("Your text here")

Output

In AI Sentinel's ToxicityGuard class, both analyze and analyze_async methods return a ToxicityResult object.

Response Format

class ToxicityResult:
    is_toxic: bool                          # Whether content is toxic
    confidence: float                       # Confidence score that the content is toxic (0.0-1.0)
    categories: List[ToxicityCategories]    # Detected toxicity categories
    reason: str                             # Explanation of the assessment
    score: ToxicityScore                    # Simplified confidence score: "low", "medium", "high"

Example

{
    "is_toxic": True,
    "confidence": 0.90,
    "categories": [
        <ToxicityCategories.THREATS: 'threats'>, 
        <ToxicityCategories.VIOLENCE: 'violence'>
    ],
    "reason": "The phrase 'I will punch you' is a clear and direct threat of physical violence. It expresses an intention to harm another person, categorizing it under threats and violence.",
    "score": <ToxicityScore.HIGH: 'high'>
}

The ToxicityCategories and ToxicityScore enums are available from ai_sentinel.models.

Toxicity Categories

AI Sentinel detects the following toxicity categories:

Hate Speech: Content attacking individuals/groups based on protected characteristics
Harassment: Hostile behavior targeting specific individuals
Threats: Direct or implied threats of violence or harm
Sexual Content: Inappropriate sexual material
Self Harm: Content promoting self-injury or suicide
Violence: Content glorifying or promoting violence
Bullying: Intimidation or aggressive behavior
Discrimination: Unfair treatment of specific groups

Configuration

Environment Variables

Tip: Add enviormental variables to a .env file

# Azure OpenAI
AZURE_API_KEY=your-azure-api-key
AZURE_API_VERSION=2024-02-01
AZURE_API_BASE=https://your-resource.openai.azure.com/

# Google Gemini
GEMINI_API_KEY=your-gemini-api-key

Using python-dotenv

from dotenv import load_dotenv
import os

load_dotenv()

client = AzureOpenAIClient(
    api_key=os.getenv("AZURE_API_KEY"),
    model="gpt-4o-mini",
    api_version=os.getenv("AZURE_API_VERSION"),
    azure_endpoint=os.getenv("AZURE_API_BASE")
)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

v0.0.8: Update documentation
v0.0.7: Add Open Source LLM integration
v0.0.6: Change authorship
v0.0.2: Adjust authorship and maintainers
v0.0.1: Initial release

Project details

Release history Release notifications | RSS feed

0.0.10

Aug 29, 2025

0.0.9

Aug 29, 2025

This version

0.0.8

Aug 27, 2025

0.0.7

Aug 22, 2025

0.0.6

Jul 30, 2025

0.0.2

Jul 30, 2025

0.0.1

Jul 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_sentinel-0.0.8.tar.gz (225.6 kB view details)

Uploaded Aug 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_sentinel-0.0.8-py3-none-any.whl (19.4 kB view details)

Uploaded Aug 27, 2025 Python 3

File details

Details for the file ai_sentinel-0.0.8.tar.gz.

File metadata

Download URL: ai_sentinel-0.0.8.tar.gz
Upload date: Aug 27, 2025
Size: 225.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.13

File hashes

Hashes for ai_sentinel-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`a2fecd6075ac2b3da3405002346cc1ed491b799a82f9298f8ff6f9a6b964c484`
MD5	`06227fb37dca52b96528fb77d9e4857e`
BLAKE2b-256	`4de151cffe2dd5dca88418d30caa02ad7b3481ead9211b65b2c0dbd733b1c219`

See more details on using hashes here.

File details

Details for the file ai_sentinel-0.0.8-py3-none-any.whl.

File metadata

Download URL: ai_sentinel-0.0.8-py3-none-any.whl
Upload date: Aug 27, 2025
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.13

File hashes

Hashes for ai_sentinel-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28e0635ca5e672608a3d4921b3c9f69c6113e9a1e3fa488e36c3501d92af7d6b`
MD5	`d7ced3068768851f28022f0b0f01e54b`
BLAKE2b-256	`8a08f6728231217501ebe793d1d92fab7ec2e739ebb68e603d0b848d30bbb6f1`

See more details on using hashes here.

ai-sentinel 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

AI Sentinel

Key Features

Getting Started

Installation

Usage

Async Usage

Supported LLM API Services

Gemini Usage

Open Source using OpenAI Usage

Output

Response Format

Example

Toxicity Categories

Configuration

Environment Variables

Using python-dotenv

License

Changelog

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes