Simple, reproducible red teaming pipeline for GPT-OSS via Ollama with DeepSeek-based prompt generation
Project description
gpt-oss-redteam
Simple, reproducible red teaming pipeline for GPT‑OSS models via Ollama, with DeepSeek-based adversarial prompt generation.
- DeepSeek API to generate adversarial prompts from high-level HITL prompts (preserves
[insert ...]). - Ollama for local GPT‑OSS inference with a fake tools manifest (one string arg:
input, description: "put all information here"). - Logs every run to JSONL for later analysis (full model JSON when available).
- Minimal analyzer: refusal rate and 95% CI.
- Minimal CLI for end-to-end runs.
Quickstart
Prereqs:
- Python 3.9+
- Ollama running locally and the model pulled:
ollama pull gpt-oss:20b - DeepSeek API key in env:
DEEPSEEK_API_KEY=...
Install (editable):
pip install -e .
Run the full pipeline with 20 prompts × 100 runs each (2,000 total):
# Create a starter prompts file you can edit
python -m gpt_oss_redteam.cli init --out prompts.txt
# Edit prompts.txt to include one prompt per line (each may include [insert ...])
# Run pipeline
gpt-oss-redteam all \
--prompts-file prompts.txt \
--runs-per-prompt 100 \
--generation-batch-size 10 \
--ollama-model gpt-oss:20b \
--out-dir runs
Outputs:
runs/<timestamp>/generated_prompts.jsonl– all generated prompts with their source high-level prompt.runs/<timestamp>/inference.jsonl– every model run with raw JSON response.runs/<timestamp>/analysis.jsonandanalysis.md– refusal rate and 95% CI.
Configuration via CLI flags or environment variables
DEEPSEEK_API_KEY– required for prompt generation.DEEPSEEK_BASE_URL– optional, defaults tohttps://api.deepseek.com.OLLAMA_BASE_URL– optional, defaults tohttp://localhost:11434/v1.
Fake tools manifest (simple)
We pass a static set of tool/function definitions so the model thinks it can call tools without consuming context with actual implementations. Every tool takes a single string arg named input with the description "put all information here". No tool execution occurs.
Safety note
This project logs raw model outputs for offline analysis. The provided analyzer only checks for a very simple refusal phrase; deeper evaluation is out of scope for this minimal baseline.
Paper outline (in docs/)
- Brief literature review
- Package pipeline and creation
- Quantified results
- Qualitative analysis of results
- Example problem prompts
- Next steps
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpt_oss_redteam-0.1.1.tar.gz.
File metadata
- Download URL: gpt_oss_redteam-0.1.1.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dc11d36840062741d1d20a0c759c95506bec59cd4f318e2c688ddb896f224c2
|
|
| MD5 |
bcf3fb4690a480cbcf2d086ce105d3b1
|
|
| BLAKE2b-256 |
397a455962c4ac9d2ee7a86ea6f4d3c46a594dd73e2600215ecc3fdde1f04176
|
File details
Details for the file gpt_oss_redteam-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gpt_oss_redteam-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c088348799d9a82c6b92a631065af9c7e0dd3e38fcc641fdf74c660dae875cb0
|
|
| MD5 |
ad0abd23a5a93097990fc8b7d6267530
|
|
| BLAKE2b-256 |
37718d59522b3a1f6cec8553ea0f8da56a3df8ac3de848c7002b71d7ddf6b9cc
|