Template-driven pedagogy red-team toolkit for LLM bias studies (EHCP demo).

These details have not been verified by PyPI

Project description

ai-ped-red-team

Template-driven pedagogy red-team toolkit for LLM bias studies (EHCP demo).

Features

Vendor-agnostic model access (LiteLLM) with built-in support for OpenAI, Google Gemini, Anthropic, etc.
Controlled prompt variation (hot/cold) with fallback safety prompts.
Counterbalanced runs across matched EHCPs and template-driven persona swapping.
Configurable axes (gender/support/history) to exhaust combinations and inject biased dialogue history.
Normalization + metrics (VADER sentiment, Detoxify toxicity, lexical heuristics) + stats + templated reports.
CLI-first, reproducible; MIT licensed.

Quickstart

# 1) Install (dev mode for local work)
uv venv && source .venv/bin/activate            # or python -m venv .venv
pip install -e ".[dev,docs]"
pre-commit install

# 2) Configure env (copy .env.example → .env; set provider keys)
export OPENAI_API_KEY=...
export GOOGLE_API_KEY=...   # for Gemini (google/* models)

> Gemini setup tip: make sure the Google project tied to `GOOGLE_API_KEY` has the *Generative Language API* enabled and billing active. Without those, Google will return HTTP 403.

# 3) Guided experience (recommended)
aprt wizard
# If `examples/ehcp_variables.toml` is present, the wizard offers to run every
# persona/support/history combination and tags each report directory with the
# chosen axis labels.

# Manual steps (if you prefer granular control)
aprt gen-variants --template src/ai_ped_red_team/templates/examples/questionnaire/q_ehcp_gender.json --hotness cold --n 5 > variants.json
aprt run --template src/ai_ped_red_team/templates/examples/questionnaire/q_ehcp_gender.json \
         --ehcp examples/EHCP-templates \
         --axes-config examples/ehcp_variables.toml \
         --model openai/gpt-5-nano
aprt analyze --results ./reports/latest/results.jsonl > summary.json
aprt report --summary summary.json

# Inspect configured providers and available models
aprt vendors
aprt models openai
aprt models google

# Need to inspect raw LiteLLM traffic?
aprt --debug run --template ...

Generated runs now include token accounting artefacts (token_usage.json and token_usage.csv) alongside results, making usage tracking easy for billing reviews.

Axes-enabled runs : The examples/ehcp_variables.toml file demonstrates two axes (gender, support) plus a history axis used to prep the conversation with templated prior turns. Each combination of axis options produces its own report directory (name suffix such as 20251005-174644-gender-anthony_davis-history-prior_misalignment). Use this mechanism to compare baseline behaviour against controlled bias injections.

The wizard first asks for a provider and model. Press Enter to keep the defaults (openai / gpt-5-nano), which deliver long-context, chat-style responses out of the box.

Bias metrics

Every result row now carries both emotional tone (VADER sentiment) and social toxicity scores (Detoxify). For each response you receive:

toxicity, severe_toxicity, obscene, threat, insult, identity_attack — Detoxify probabilities for the answer under test.
history_* counterparts capturing the injected conversation context when a history axis is active.
*_delta fields quantifying how much more or less toxic the final answer became relative to the history seed.

The metrics surface in results.csv, metrics.csv, and summary outputs, making it easy to correlate sentiment drift with professionalism leakage.

Need different lexicons? Point the APRT_SENTIMENT_LEXICON environment variable at your own JSON file, or drop sentiment_words.json into the repository root or ~/.ai_ped_red_team/. The analyzer seeds the file with defaults on first use and preserves any edits you make afterwards.

Monitoring channels

Detector	Package(s) / Install Command	Local Use
Sentiment + Valence Shift	`pip install nltk textblob transformers torch` (VADER, TextBlob, optional RoBERTa sentiment)	Run directly on text samples; track baseline mean ± σ to quantify valence drift.
Toxicity / Harassment	`pip install detoxify torch`	Load Unitary's Detoxify (`Detoxify('original')`) for toxicity, severe toxicity, threat, insult, and identity-attack probabilities.
Perplexity / Entropy Drift	`pip install transformers torch`	Score GPT-2 perplexity per sentence to monitor language-model confidence changes.
Emotion Classifier	`pip install transformers torch` then use `bhadresh-savani/distilbert-base-uncased-emotion` or `joeddav/distilbert-base-uncased-go-emotions-student`	Hugging Face pipeline; no external API required.
Prompt Coercion Heuristics	`pip install spacy` and `python -m spacy download en_core_web_sm`	Rule-based detection of modal and imperative spikes that suggest coercive phrasing.
Linguistic Style Drift	`pip install textdistance sentence-transformers`	Compare embeddings or token overlap between turns to catch stylistic shifts.
Embeddings Similarity Monitor	`pip install sentence-transformers`	Measure semantic similarity between reference phrases and model output to catch topic drift.

Axes & history injection

The new axes system lets you factor experiments along multiple dimensions:

Persona/Gender axis — swaps in templated names, short forms, and a complete pronoun set. The EHCP templates in examples/EHCP-templates/ contain placeholders for each pronoun to guarantee consistent substitution.
Support axis — rewrites the Primary/Secondary/Additional Needs blocks so that the same prompt variants can be evaluated against distinct support requirements.
History axis — feeds a sequence of prior turns (HISTORY_PROMPTS) into the model before the prompt under test. This simulates teachers reusing a chat thread without clearing context and is a high-leverage way to study bias drift.

All axes are described in a single TOML file so runs remain reproducible. The runner expands every combination, materialises a dedicated report directory per combination, records the axis labels in results.jsonl, and includes the injected history text in each call made to the LLM.

Gemini endpoints ignore LiteLLM's seed parameter. The gateway now drops it automatically for all gemini/*, gemma-*, learnlm-*, imagen-*, and Google embedding models so you can keep deterministic-looking defaults without triggering API validation errors.

Ethics

Default examples are “cold”; pass --ack-hot to enable “hot” variants.
Redact PII; do not share raw EHCP data publicly.

License

MIT © SoftOboros

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Oct 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_ped_red_team-0.1.0.tar.gz (1.9 MB view details)

Uploaded Oct 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_ped_red_team-0.1.0-py3-none-any.whl (42.3 kB view details)

Uploaded Oct 15, 2025 Python 3

File details

Details for the file ai_ped_red_team-0.1.0.tar.gz.

File metadata

Download URL: ai_ped_red_team-0.1.0.tar.gz
Upload date: Oct 15, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_ped_red_team-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`23f16b6f81085c2cc609cf162ca45c763bdc43bbdbfc5c689633ddad245b0ee6`
MD5	`e3daa5cc224fdedd521db5f74182de97`
BLAKE2b-256	`2461219f5fb95fbbd65dd4dc734cc07dfccd5fd454a7ea28ba86734519e7aa56`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_ped_red_team-0.1.0.tar.gz:

Publisher: release.yml on SoftOboros/ai-ped-red-team

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_ped_red_team-0.1.0.tar.gz
- Subject digest: 23f16b6f81085c2cc609cf162ca45c763bdc43bbdbfc5c689633ddad245b0ee6
- Sigstore transparency entry: 607712732
- Sigstore integration time: Oct 15, 2025
Source repository:
- Permalink: SoftOboros/ai-ped-red-team@a7d8226b5c89deb37e0899e285ead954fa2b98b8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SoftOboros
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a7d8226b5c89deb37e0899e285ead954fa2b98b8
- Trigger Event: push

File details

Details for the file ai_ped_red_team-0.1.0-py3-none-any.whl.

File metadata

Download URL: ai_ped_red_team-0.1.0-py3-none-any.whl
Upload date: Oct 15, 2025
Size: 42.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_ped_red_team-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d0bcb941f926936e596016ae3791cc8f3f7d8dcec7bf8953e0dc6678274b4569`
MD5	`6a47adfafcba0fea79b623f81e1da5c0`
BLAKE2b-256	`0222d03d1747a74117e6b3a8404c802cd045b73178c090d1f4854d567c286612`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_ped_red_team-0.1.0-py3-none-any.whl:

Publisher: release.yml on SoftOboros/ai-ped-red-team

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_ped_red_team-0.1.0-py3-none-any.whl
- Subject digest: d0bcb941f926936e596016ae3791cc8f3f7d8dcec7bf8953e0dc6678274b4569
- Sigstore transparency entry: 607712734
- Sigstore integration time: Oct 15, 2025
Source repository:
- Permalink: SoftOboros/ai-ped-red-team@a7d8226b5c89deb37e0899e285ead954fa2b98b8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/SoftOboros
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a7d8226b5c89deb37e0899e285ead954fa2b98b8
- Trigger Event: push

ai-ped-red-team 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ai-ped-red-team

Features

Quickstart

Bias metrics

Monitoring channels

Axes & history injection

Ethics

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance