An LLM-based pipeline to detect toxic speech.
Project description
๐ฃ Toxicity Detector
An LLM-based pipeline to detect toxic speech.
๐ฏ About the Toxicity Detector
The Toxicity Detector is a configurable pipeline that uses a Large Language Model (LLM) to analyze a text and decide whether it contains toxic speech.
It supports two toxicity types out of the box:
personalized_toxicity: toxic speech directed at a specific individual (insults, threats, harassment, โฆ)hatespeech: group-based toxicity / hate speech (targeting groups or individuals because of group membership)
Both toxicity types are defined in the pipeline configuration file under the toxicities: section.
The Toxicity Detector Workflow
At a high level the pipeline works as follows:
- Preprocessing / preparatory analysis: the model answers โgeneral questionsโ that help it interpret the input (e.g., who is targeted, irony/quotes/context).
- Indicator analysis: the model evaluates a set of configurable indicators (tasks) that represent typical forms of toxicity (e.g., threats, insults, victim shaming).
- Final decision: the pipeline aggregates these intermediate results and returns:
contains_toxicity: one oftrue,false,unclearanalysis_result: a human-readable explanation
The indicators and the phrasing of the model prompts are configurable via YAML.
๐ฅ๏ธ Quick Start
Prerequisites
- Python 3.12 or higher
Installation via PyPi
Install the toxicity-detector package via PyPi (e.g., by using pip):
pip install toxicity-detector
Setting up a minimal configuration
You need a pipeline configuration (YAML) to run toxicity detection. This repo ships example configs in config/:
config/pipeline_config.yaml: pipeline configuration used by the CLI and Python APIconfig/app_config.yaml: configuration for the Gradio demo app (optional)
Start by copying the example files and adjusting them to your environment (models, API keys, storage paths).
API Keys
API keys are referenced by name in the pipeline config (e.g., API_KEY_NAME) and are expected to be present as environment variables.
Create a .env file in the project root with the following variables:
# API Keys (by the names as specified in the model config files)
API_KEY_NAME=your_api_key_value
Alternatively, you can set the environment variables in your shell/session (instead of using .env).
๐ Running the Pipeline
Using the CLI
The simplest way to run toxicity detection from the command line (within the environment you installed the toxicity package into):
# Basic usage
toxicity-detector detect \
--text "Your text to analyze" \
--pipeline-config ./config/pipeline_config.yaml
# With all options
toxicity-detector detect \
--text "Your text to analyze" \
--pipeline-config ./config/pipeline_config.yaml \
--toxicity-type personalized_toxicity \
--source "chat" \
--context "Additional context here" \
--save \
--verbose
Programmatically
from toxicity_detector import detect_toxicity, PipelineConfig
# Load pipeline configuration from YAML file
pipeline_config = PipelineConfig.from_file('./config/pipeline_config.yaml')
# The text to analyze for toxicity
input_text = 'Peter is dumn.'
# Run toxicity detection
result = detect_toxicity(
input_text=input_text, # The text to be analyzed
user_input_source=None, # Optional: identifier for the source of the input (e.g., 'chat', 'comment')
toxicity_type='personalized_toxicity', # Type of toxicity analysis to perform ('personalized_toxicity' or 'hatespeech')
context_info=None, # Optional: additional context about the conversation or situation
pipeline_config=pipeline_config, # Configuration specifying model, paths, and behavior
serialize_result=True, # If True, saves the result to disk as YAML
)
# Display the analysis result and toxicity verdict
print(result.answer['contains_toxicity'])
We also provide an example notebook that demonstrates how to run the toxicity detection pipeline with a Hugging Face API key.
๐งญ Using the Gradio Demoapp
The project includes a Gradio web interface for interactive toxicity detection.
Using the CLI
Run the app using the simple command:
# With app configuration file
toxicity-detector app --app-config ./config/app_config.yaml
# With pipeline configuration file (uses default app settings)
toxicity-detector app --pipeline-config ./config/pipeline_config.yaml
# With custom server settings
toxicity-detector app \
--app-config ./config/app_config.yaml \
--server-port 8080 \
--share
The app will start and be accessible at http://localhost:7860 by default (or your specified port).
Configuration
To enable developer mode with additional configuration options, update your config/app_config.yaml:
developer_mode: true
Note: the configuration tab is only shown when developer_mode: true. If force_agreement: true, you must accept the agreement first.
Additional information about the different settings can be found in the config/app_config.yaml.
๐ ๏ธ Configuration of the Pipeline
The pipeline is configured via a YAML file that is loaded into the Pydantic model PipelineConfig.
- Config schema/model:
src/toxicity_detector/config.py(class PipelineConfig) - Main entry point:
src/toxicity_detector/backend.py(detect_toxicity(...))
Key sections in config/pipeline_config.yaml:
- Model selection:
used_chat_modeland themodels:dictionary (provider/model/base_url +api_key_name) - Storage:
local_serialization,local_base_path,result_data_path,log_path,subdirectory_construction - Toxicity definitions:
toxicities:(currentlypersonalized_toxicityandhatespeech)- Each toxicity type contains
tasks:which includesprepatory_analysis.general_questionsindicator_analysis.*(your indicator list)
- Each toxicity type contains
- Prompts: prompt templates are configurable (see
prompt_templatesin the default pipeline config)
If you want to start from a known-good baseline, the package contains a default pipeline config with all default prompts here:
src/toxicity_detector/package_data/default_pipeline_config.yaml.
Additional information about the different settings can be found in the config/pipeline_config.yaml.
๐ง Development
Project Structure
High-level overview of the repository layout:
toxicity-detector/
โโโ config/ # Configuration template files
โ โโโ app_config.yaml # Gradio app configuration (AppConfig)
โ โโโ pipeline_config.yaml # Pipeline configuration (PipelineConfig)
โโโ src/
โ โโโ toxicity_detector/
โ โโโ __init__.py
โ โโโ app/ # Gradio web interface (modularized)
โ โ โโโ app.py
โ โ โโโ app_config_loader.py
โ โ โโโ agreement_tab.py
โ โ โโโ config_tab.py
โ โ โโโ detection_tab.py
โ โโโ backend.py # Core detection logic (detect_toxicity)
โ โโโ chains.py # LangChain pipelines
โ โโโ cli.py # CLI entry point (toxicity-detector)
โ โโโ config.py # Pydantic config models
โ โโโ managers/ # Config and persistence utilities
โโโ pyproject.toml # Project dependencies
โโโ README.md # This file
Setup
This project uses uv for dependency management.
Prerequisites
- Python 3.12 or higher
- uv package manager
Installation
-
Install uv (if not already installed):
-
Clone the repository:
git clone https://github.com/debatelab/toxicity-detector.git cd toxicity-detector
-
Install dependencies:
uv syncThis will create a virtual environment and install all dependencies specified in
pyproject.toml. If auv.lockis present,uvwill reproduce the environment specified in that file. If you want to start with a fresh environment and/or use other package versions, remove or update theuv.lockaccordingly. -
Install development dependencies (optional):
uv sync --group dev
Running Tests
Run all tests:
uv run pytest
Run tests with verbose output:
uv run pytest -v
Run a specific test file:
uv run pytest tests/test_config.py
Run tests with coverage report:
uv run pytest --cov=src/toxicity_detector
Alternative: Using the activated virtual environment:
# Activate the virtual environment first
source .venv/bin/activate # On Linux/Mac
# or
.venv\Scripts\activate # On Windows
# Then run pytest directly
pytest tests/
pytest tests/test_config.py -v
Working with Notebooks
To use Jupyter notebooks for development:
# Install dev dependencies if not already done
uv sync --group dev
# Start Jupyter
uv run jupyter notebook notebooks/
๐ Acknowledgements
๐ ๏ธ Powered By
- LangChain: Workflow orchestration
- Gradio: Interactive web interface
- Pydantic: Data validation and configuration management
- Hugging Face: Model hosting and deployment
๐๏ธ Funding
The Toxicity Detector was implemented as part of the project "Opportunities of AI to Strengthen Our Deliberative Culture" (KIdeKu) which was funded by the Federal Ministry of Education, Family Affairs, Senior Citizens, Women and Youth (BMBFSFJ).
๐ License
This project is licensed under the MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toxicity_detector-0.1.0.tar.gz.
File metadata
- Download URL: toxicity_detector-0.1.0.tar.gz
- Upload date:
- Size: 32.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f9e8fcb671a6f19a371404963128e4e1968b77729ae2f7f0762cb713376b893
|
|
| MD5 |
b8f6440e6b3bc2975c15c8aaa55397e8
|
|
| BLAKE2b-256 |
cd036d2dcfea481e589ed22964060f0cc03d934022e79f90f32e76da948c8c89
|
File details
Details for the file toxicity_detector-0.1.0-py3-none-any.whl.
File metadata
- Download URL: toxicity_detector-0.1.0-py3-none-any.whl
- Upload date:
- Size: 39.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fd47834762c3a196b1a9c0108b4b7d5a9a686931130f5ad85c8044c16b9413f
|
|
| MD5 |
5975172680113e205d2f906a9d6a116e
|
|
| BLAKE2b-256 |
27eed99f51f694efe0b9905f9f0cf26a30450428a0f50be1cc5e09ca5c3044d7
|