Simple, reproducible red teaming pipeline for GPT-OSS via Ollama with DeepSeek-based prompt generation

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

gpt-oss-redteam

Simple, reproducible red teaming pipeline for GPT‑OSS models via Ollama, with DeepSeek-based adversarial prompt generation.

DeepSeek API to generate adversarial prompts from high-level HITL prompts (preserves [insert ...]).
Ollama for local GPT‑OSS inference with a fake tools manifest (one string arg: input, description: "put all information here").
Logs every run to JSONL for later analysis (full model JSON when available).
Minimal analyzer: refusal rate and 95% CI.
Minimal CLI for end-to-end runs.

Quickstart

Prereqs:

Python 3.9+
Ollama running locally and the model pulled: ollama pull gpt-oss:20b
DeepSeek API key in env: DEEPSEEK_API_KEY=...

Install (editable):

pip install -e .

Run the full pipeline with 20 prompts × 100 runs each (2,000 total):

# Create a starter prompts file you can edit
python -m gpt_oss_redteam.cli init --out prompts.txt

# Edit prompts.txt to include one prompt per line (each may include [insert ...])

# Run pipeline
gpt-oss-redteam all \
  --prompts-file prompts.txt \
  --runs-per-prompt 100 \
  --generation-batch-size 10 \
  --ollama-model gpt-oss:20b \
  --out-dir runs

Outputs:

runs/<timestamp>/generated_prompts.jsonl – all generated prompts with their source high-level prompt.
runs/<timestamp>/inference.jsonl – every model run with raw JSON response.
runs/<timestamp>/analysis.json and analysis.md – refusal rate and 95% CI.

Configuration via CLI flags or environment variables

DEEPSEEK_API_KEY – required for prompt generation.
DEEPSEEK_BASE_URL – optional, defaults to https://api.deepseek.com.
OLLAMA_BASE_URL – optional, defaults to http://localhost:11434/v1.

Fake tools manifest (simple)

We pass a static set of tool/function definitions so the model thinks it can call tools without consuming context with actual implementations. Every tool takes a single string arg named input with the description "put all information here". No tool execution occurs.

Safety note

This project logs raw model outputs for offline analysis. The provided analyzer only checks for a very simple refusal phrase; deeper evaluation is out of scope for this minimal baseline.

Paper outline (in `docs/`)

Brief literature review
Package pipeline and creation
Quantified results
Qualitative analysis of results
Example problem prompts
Next steps

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.3

Aug 20, 2025

0.1.2

Aug 20, 2025

This version

0.1.1

Aug 20, 2025

0.1.0

Aug 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpt_oss_redteam-0.1.1.tar.gz (9.0 kB view details)

Uploaded Aug 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gpt_oss_redteam-0.1.1-py3-none-any.whl (12.3 kB view details)

Uploaded Aug 20, 2025 Python 3

File details

Details for the file gpt_oss_redteam-0.1.1.tar.gz.

File metadata

Download URL: gpt_oss_redteam-0.1.1.tar.gz
Upload date: Aug 20, 2025
Size: 9.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for gpt_oss_redteam-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6dc11d36840062741d1d20a0c759c95506bec59cd4f318e2c688ddb896f224c2`
MD5	`bcf3fb4690a480cbcf2d086ce105d3b1`
BLAKE2b-256	`397a455962c4ac9d2ee7a86ea6f4d3c46a594dd73e2600215ecc3fdde1f04176`

See more details on using hashes here.

File details

Details for the file gpt_oss_redteam-0.1.1-py3-none-any.whl.

File metadata

Download URL: gpt_oss_redteam-0.1.1-py3-none-any.whl
Upload date: Aug 20, 2025
Size: 12.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for gpt_oss_redteam-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c088348799d9a82c6b92a631065af9c7e0dd3e38fcc641fdf74c660dae875cb0`
MD5	`ad0abd23a5a93097990fc8b7d6267530`
BLAKE2b-256	`37718d59522b3a1f6cec8553ea0f8da56a3df8ac3de848c7002b71d7ddf6b9cc`

See more details on using hashes here.

gpt-oss-redteam 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gpt-oss-redteam

Quickstart

Configuration via CLI flags or environment variables

Fake tools manifest (simple)

Safety note

Paper outline (in `docs/`)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

gpt-oss-redteam 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

gpt-oss-redteam

Quickstart

Configuration via CLI flags or environment variables

Fake tools manifest (simple)

Safety note

Paper outline (in docs/)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Paper outline (in `docs/`)