Skip to main content

Simple, reproducible red teaming pipeline for GPT-OSS via Ollama with DeepSeek-based prompt generation

Project description

gpt-oss-redteam

Simple, reproducible red teaming pipeline for GPT‑OSS models via Ollama, with DeepSeek-based adversarial prompt generation.

  • DeepSeek API to generate adversarial prompts from high-level HITL prompts (preserves [insert ...]).
  • Ollama for local GPT‑OSS inference with a fake tools manifest (one string arg: input, description: "put all information here").
  • Logs every run to JSONL for later analysis (full model JSON when available).
  • Minimal analyzer: refusal rate and 95% CI.
  • Minimal CLI for end-to-end runs.

Quickstart

Prereqs:

  • Python 3.9+
  • Ollama running locally and the model pulled: ollama pull gpt-oss:20b
  • DeepSeek API key in env: DEEPSEEK_API_KEY=...

Install (editable):

pip install -e .

Run the full pipeline with 20 prompts × 100 runs each (2,000 total):

# Create a starter prompts file you can edit
python -m gpt_oss_redteam.cli init --out prompts.txt

# Edit prompts.txt to include one prompt per line (each may include [insert ...])

# Run pipeline
gpt-oss-redteam all \
  --prompts-file prompts.txt \
  --runs-per-prompt 100 \
  --generation-batch-size 10 \
  --ollama-model gpt-oss:20b \
  --out-dir runs

Outputs:

  • runs/<timestamp>/generated_prompts.jsonl – all generated prompts with their source high-level prompt.
  • runs/<timestamp>/inference.jsonl – every model run with raw JSON response.
  • runs/<timestamp>/analysis.json and analysis.md – refusal rate and 95% CI.

Configuration via CLI flags or environment variables

  • DEEPSEEK_API_KEY – required for prompt generation.
  • DEEPSEEK_BASE_URL – optional, defaults to https://api.deepseek.com.
  • OLLAMA_BASE_URL – optional, defaults to http://localhost:11434/v1.

Fake tools manifest (simple)

We pass a static set of tool/function definitions so the model thinks it can call tools without consuming context with actual implementations. Every tool takes a single string arg named input with the description "put all information here". No tool execution occurs.

Safety note

This project logs raw model outputs for offline analysis. The provided analyzer only checks for a very simple refusal phrase; deeper evaluation is out of scope for this minimal baseline.

Paper outline (in docs/)

  • Brief literature review
  • Package pipeline and creation
  • Quantified results
  • Qualitative analysis of results
  • Example problem prompts
  • Next steps

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpt_oss_redteam-0.1.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpt_oss_redteam-0.1.1-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file gpt_oss_redteam-0.1.1.tar.gz.

File metadata

  • Download URL: gpt_oss_redteam-0.1.1.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for gpt_oss_redteam-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6dc11d36840062741d1d20a0c759c95506bec59cd4f318e2c688ddb896f224c2
MD5 bcf3fb4690a480cbcf2d086ce105d3b1
BLAKE2b-256 397a455962c4ac9d2ee7a86ea6f4d3c46a594dd73e2600215ecc3fdde1f04176

See more details on using hashes here.

File details

Details for the file gpt_oss_redteam-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for gpt_oss_redteam-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c088348799d9a82c6b92a631065af9c7e0dd3e38fcc641fdf74c660dae875cb0
MD5 ad0abd23a5a93097990fc8b7d6267530
BLAKE2b-256 37718d59522b3a1f6cec8553ea0f8da56a3df8ac3de848c7002b71d7ddf6b9cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page