Skip to main content

An LLM-based pipeline to detect toxic speech.

Project description

๐Ÿ“ฃ Toxicity Detector

An LLM-based pipeline to detect toxic speech.

๐ŸŽฏ About the Toxicity Detector

The Toxicity Detector is a configurable pipeline that uses a Large Language Model (LLM) to analyze a text and decide whether it contains toxic speech.

It supports two toxicity types out of the box:

  • personalized_toxicity: toxic speech directed at a specific individual (insults, threats, harassment, โ€ฆ)
  • hatespeech: group-based toxicity / hate speech (targeting groups or individuals because of group membership)

Both toxicity types are defined in the pipeline configuration file under the toxicities: section.

The Toxicity Detector Workflow

At a high level the pipeline works as follows:

  1. Preprocessing / preparatory analysis: the model answers โ€œgeneral questionsโ€ that help it interpret the input (e.g., who is targeted, irony/quotes/context).
  2. Indicator analysis: the model evaluates a set of configurable indicators (tasks) that represent typical forms of toxicity (e.g., threats, insults, victim shaming).
  3. Final decision: the pipeline aggregates these intermediate results and returns:
  • contains_toxicity: one of true, false, unclear
  • analysis_result: a human-readable explanation

The indicators and the phrasing of the model prompts are configurable via YAML.

Figure of workflow

๐Ÿ–ฅ๏ธ Quick Start

Prerequisites

  • Python 3.12 or higher

Installation via PyPi

Install the toxicity-detector package via PyPi (e.g., by using pip):

pip install toxicity-detector

Setting up a minimal configuration

You need a pipeline configuration (YAML) to run toxicity detection. This repo ships example configs in config/:

Start by copying the example files and adjusting them to your environment (models, API keys, storage paths).

API Keys

API keys are referenced by name in the pipeline config (e.g., API_KEY_NAME) and are expected to be present as environment variables.

Create a .env file in the project root with the following variables:

# API Keys (by the names as specified in the model config files)
API_KEY_NAME=your_api_key_value

Alternatively, you can set the environment variables in your shell/session (instead of using .env).

๐Ÿš€ Running the Pipeline

Using the CLI

The simplest way to run toxicity detection from the command line (within the environment you installed the toxicity package into):

# Basic usage
toxicity-detector detect \
  --text "Your text to analyze" \
  --pipeline-config ./config/pipeline_config.yaml

# With all options
toxicity-detector detect \
  --text "Your text to analyze" \
  --pipeline-config ./config/pipeline_config.yaml \
  --toxicity-type personalized_toxicity \
  --source "chat" \
  --context "Additional context here" \
  --save \
  --verbose

Programmatically

from toxicity_detector import detect_toxicity, PipelineConfig

# Load pipeline configuration from YAML file
pipeline_config = PipelineConfig.from_file('./config/pipeline_config.yaml')

# The text to analyze for toxicity
input_text = 'Peter is dumn.'

# Run toxicity detection
result = detect_toxicity(
    input_text=input_text,  # The text to be analyzed
    user_input_source=None,  # Optional: identifier for the source of the input (e.g., 'chat', 'comment')
    toxicity_type='personalized_toxicity',  # Type of toxicity analysis to perform ('personalized_toxicity' or 'hatespeech')
    context_info=None,  # Optional: additional context about the conversation or situation
    pipeline_config=pipeline_config,  # Configuration specifying model, paths, and behavior
    serialize_result=True,  # If True, saves the result to disk as YAML
)

# Display the analysis result and toxicity verdict
print(result.answer['contains_toxicity'])

We also provide an example notebook that demonstrates how to run the toxicity detection pipeline with a Hugging Face API key.

๐Ÿงญ Using the Gradio Demoapp

The project includes a Gradio web interface for interactive toxicity detection.

Using the CLI

Run the app using the simple command:

# With app configuration file
toxicity-detector app --app-config ./config/app_config.yaml

# With pipeline configuration file (uses default app settings)
toxicity-detector app --pipeline-config ./config/pipeline_config.yaml

# With custom server settings
toxicity-detector app \
  --app-config ./config/app_config.yaml \
  --server-port 8080 \
  --share

The app will start and be accessible at http://localhost:7860 by default (or your specified port).

Configuration

To enable developer mode with additional configuration options, update your config/app_config.yaml:

developer_mode: true

Note: the configuration tab is only shown when developer_mode: true. If force_agreement: true, you must accept the agreement first.

Additional information about the different settings can be found in the config/app_config.yaml.

๐Ÿ› ๏ธ Configuration of the Pipeline

The pipeline is configured via a YAML file that is loaded into the Pydantic model PipelineConfig.

  • Config schema/model: src/toxicity_detector/config.py (class PipelineConfig)
  • Main entry point: src/toxicity_detector/backend.py (detect_toxicity(...))

Key sections in config/pipeline_config.yaml:

  • Model selection: used_chat_model and the models: dictionary (provider/model/base_url + api_key_name)
  • Storage: local_serialization, local_base_path, result_data_path, log_path, subdirectory_construction
  • Toxicity definitions: toxicities: (currently personalized_toxicity and hatespeech)
    • Each toxicity type contains tasks: which includes
      • prepatory_analysis.general_questions
      • indicator_analysis.* (your indicator list)
  • Prompts: prompt templates are configurable (see prompt_templates in the default pipeline config)

If you want to start from a known-good baseline, the package contains a default pipeline config with all default prompts here: src/toxicity_detector/package_data/default_pipeline_config.yaml.

Additional information about the different settings can be found in the config/pipeline_config.yaml.

๐Ÿ”ง Development

Project Structure

High-level overview of the repository layout:

toxicity-detector/
โ”œโ”€โ”€ config/                          # Configuration template files
โ”‚   โ”œโ”€โ”€ app_config.yaml              # Gradio app configuration (AppConfig)
โ”‚   โ””โ”€โ”€ pipeline_config.yaml         # Pipeline configuration (PipelineConfig)
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ toxicity_detector/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ app/                     # Gradio web interface (modularized)
โ”‚       โ”‚   โ”œโ”€โ”€ app.py
โ”‚       โ”‚   โ”œโ”€โ”€ app_config_loader.py
โ”‚       โ”‚   โ”œโ”€โ”€ agreement_tab.py
โ”‚       โ”‚   โ”œโ”€โ”€ config_tab.py
โ”‚       โ”‚   โ””โ”€โ”€ detection_tab.py
โ”‚       โ”œโ”€โ”€ backend.py               # Core detection logic (detect_toxicity)
โ”‚       โ”œโ”€โ”€ chains.py                # LangChain pipelines
โ”‚       โ”œโ”€โ”€ cli.py                   # CLI entry point (toxicity-detector)
โ”‚       โ”œโ”€โ”€ config.py                # Pydantic config models
โ”‚       โ””โ”€โ”€ managers/                # Config and persistence utilities
โ”œโ”€โ”€ pyproject.toml                  # Project dependencies
โ””โ”€โ”€ README.md                       # This file

Setup

This project uses uv for dependency management.

Prerequisites

  • Python 3.12 or higher
  • uv package manager

Installation

  1. Install uv (if not already installed):

  2. Clone the repository:

    git clone https://github.com/debatelab/toxicity-detector.git
    cd toxicity-detector
    
  3. Install dependencies:

    uv sync
    

    This will create a virtual environment and install all dependencies specified in pyproject.toml. If a uv.lock is present, uv will reproduce the environment specified in that file. If you want to start with a fresh environment and/or use other package versions, remove or update the uv.lock accordingly.

  4. Install development dependencies (optional):

    uv sync --group dev
    

Running Tests

Run all tests:

uv run pytest

Run tests with verbose output:

uv run pytest -v

Run a specific test file:

uv run pytest tests/test_config.py

Run tests with coverage report:

uv run pytest --cov=src/toxicity_detector

Alternative: Using the activated virtual environment:

# Activate the virtual environment first
source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate  # On Windows

# Then run pytest directly
pytest tests/
pytest tests/test_config.py -v

Working with Notebooks

To use Jupyter notebooks for development:

# Install dev dependencies if not already done
uv sync --group dev

# Start Jupyter
uv run jupyter notebook notebooks/

๐Ÿ™ Acknowledgements

๐Ÿ› ๏ธ Powered By

๐Ÿ›๏ธ Funding

The Toxicity Detector was implemented as part of the project "Opportunities of AI to Strengthen Our Deliberative Culture" (KIdeKu) which was funded by the Federal Ministry of Education, Family Affairs, Senior Citizens, Women and Youth (BMBFSFJ).

BMFSFJ Funding

๐Ÿ“„ License

This project is licensed under the MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toxicity_detector-0.1.0.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toxicity_detector-0.1.0-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file toxicity_detector-0.1.0.tar.gz.

File metadata

  • Download URL: toxicity_detector-0.1.0.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for toxicity_detector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0f9e8fcb671a6f19a371404963128e4e1968b77729ae2f7f0762cb713376b893
MD5 b8f6440e6b3bc2975c15c8aaa55397e8
BLAKE2b-256 cd036d2dcfea481e589ed22964060f0cc03d934022e79f90f32e76da948c8c89

See more details on using hashes here.

File details

Details for the file toxicity_detector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for toxicity_detector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1fd47834762c3a196b1a9c0108b4b7d5a9a686931130f5ad85c8044c16b9413f
MD5 5975172680113e205d2f906a9d6a116e
BLAKE2b-256 27eed99f51f694efe0b9905f9f0cf26a30450428a0f50be1cc5e09ca5c3044d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page