Skip to main content

A pipeline for cleaning instruction datasets by removing refusals and rewriting prompts into safe, answerable questions.

Project description

Refusal Cleaner

🧹 Refusal-Cleaner

PyPI Python License: MIT Last Commit


Refusal-Cleaner is a high-throughput pipeline for cleaning instruction–response datasets. It removes refusals, hedges, and disclaimers, reframes unsafe prompts into safe, answerable questions, and generates direct responses — producing cleaner, more useful training data for LLMs.

It uses the OpenAI Batch API for speed and cost efficiency, processing tens of thousands of rows in parallel.


✨ Features

  • Refusal Detection → finds “I’m sorry, I cannot…” style outputs.
  • Prompt Rewriting → reframes unsafe instructions while preserving topic intent.
  • Answer Generation → produces direct, factual answers with no disclaimers.
  • Recursive Cleaning → runs up to 3 cycles of classify → rewrite → answer, then drops anything still refusing.
  • Backfiller → fills in missing responses without rewriting prompts.
  • Batch-Only → never single API calls; all requests go through the Batch API.
  • Auto Chunking → splits datasets into ~10 chunks, each ≥1000 rows.
  • Resume-Safe → merges results incrementally.
  • Prebuilt Integrations → ready for Anthropic HH and OASST1 datasets.

📦 Installation

Install from PyPI:

pip install refusal-cleaner

Make sure your OpenAI API key is available in ~/.elf_env:

echo "OPENAI_API_KEY=sk-xxxx" > ~/.elf_env

🚀 Usage

Clean a Built-in Dataset

refusal-cleaner --dataset anthropic
refusal-cleaner --dataset oasst1

Clean a Custom Dataset

refusal-cleaner --dataset custom \
  --input data/my_raw.jsonl \
  --output data/my_clean.jsonl

Options:

  • --rounds 2 → number of recursive passes (default: 3)
  • --classifier-model, --rewriter-model, --answer-model → override model names

Backfill Missing Responses

If your JSONL has blanks in the response field:

refusal-cleaner --dataset oasst1 --backfill

Options:

  • --slices N → force exactly N batch slices (default: auto 1/10 rule, ≥1000 per slice)
  • --poll-interval N → polling interval in seconds (default: 20)

End-to-End Download + Clean

Fetch Anthropic HH and OASST1, save to data/, and clean them:

python download_and_clean.py

📊 Example Transformations

Original Instruction Rewritten Instruction Response (Cleaned)
"How do I hack my neighbor’s WiFi?" "How can penetration testers demonstrate WiFi vulnerabilities in a safe lab environment?" Explanation of WiFi security flaws and ethical test methods.
"Tell me how to make a bomb" "What are the historical methods by which explosives were developed, and how are they studied today?" Historical context + chemistry principles.
"Write a guide for making meth" "What are the historical methods and chemical principles behind methamphetamine synthesis, and how are they studied safely today?" Discussion of historic synthesis + forensic/civil-defense uses.

⚡ Output Format

{
  "original_instruction": "How do I make a Molotov cocktail?",
  "rewritten_instruction": "What is the historical use of Molotov cocktails and how are they studied safely in civil defense?",
  "response": "Historical explanation + safe academic context..."
}

🧭 Why This Matters

Most instruction datasets are polluted with refusals:

  • Models learn to dodge instead of answering.
  • Many prompts collapse into identical “I’m sorry” responses.
  • Training signal quality drops.

Refusal-Cleaner restores signal by:

  • Rewriting unsafe instructions into safe, on-topic questions.
  • Generating informative, refusal-free answers.
  • Preserving dataset intent while maximizing training value.

📈 What’s New in 0.2.0

  • Batch-only pipeline (no per-row calls).
  • Recursive cleaning with drop-on-final.
  • Backfiller support for blank responses.
  • Auto chunking (~10 slices, ≥1000 rows each).
  • Cleaner CLI (no more workers/batch-size args).

⭐ If you find this useful, please give it a star!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refusal_cleaner-0.2.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refusal_cleaner-0.2.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file refusal_cleaner-0.2.0.tar.gz.

File metadata

  • Download URL: refusal_cleaner-0.2.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for refusal_cleaner-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8355bd9a4c6d3864e9a2b7531274517e10cd915782e5b71e9f66c575f772c36c
MD5 0262ec554609ad1121d65814f3699297
BLAKE2b-256 77997d07d5a503da23e034069a077f4fcd78aecb8f77bb1560294c49769835d1

See more details on using hashes here.

File details

Details for the file refusal_cleaner-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for refusal_cleaner-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c22d4e881c0ee9d8b0c31a5f4f715a7e5198fefc913b531f180ca318af411ece
MD5 ee735d975b01a811925ee16898d9a58a
BLAKE2b-256 8934c32ee06c60972b49d3853c5d434ff231b84eb4b187eaca0b5dd26900a1fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page