Skip to main content

Diagnostic Dialogue Optimization framework for prompt repair.

Project description

DDO Prompt Optimizer

Diagnostic Dialogue Optimization (DDO) is a prompt optimization framework based on the paper copied into this repository. A stronger teacher model conducts a multi-turn diagnostic conversation with a student model, compiles a structured weakness profile, proposes a minimal prompt repair, optionally verifies the edit on a small dataset or external evaluator, then resets and repeats.

This repository includes the paper, a full-stack OpenAI SDK implementation, a browser UI, npm and pip library entrypoints, a DeepEval adapter, Codespaces support, tests, a CI workflow template, and example data.

Quick Start

npm install
cp .env.example .env
npm run doctor
npm run dev

Open http://127.0.0.1:5174.

Add your API key either to .env:

OPENAI_API_KEY=<your-openai-api-key>

or paste it into the UI key field for a single run. UI keys are sent only to the local server for that request and are not written to disk.

Codespaces

Open the project in a dedicated Codespaces environment:

https://codespaces.new/irodcompany5-tech/ddo

The devcontainer installs dependencies, runs npm run doctor, and forwards port 5174.

Install As A Library

JavaScript/TypeScript projects:

npm install ddo-prompt-optimizer

Python projects:

pip install ddo-prompt-optimizer

Until packages are published to npm/PyPI, install directly from GitHub:

npm install github:irodcompany5-tech/ddo
pip install "git+https://github.com/irodcompany5-tech/ddo.git"

For DeepEval helpers:

pip install "ddo-prompt-optimizer[deepeval]"

JavaScript API

import { DDOOptimizer } from "ddo-prompt-optimizer";

const optimizer = new DDOOptimizer({
  teacherModel: "gpt-5.5",
  studentModel: "gpt-5.5",
  verifierModel: "gpt-5.5"
});

const result = await optimizer.optimize(
  {
    initialPrompt: "You are a careful assistant.",
    behaviorSpec: "Follow requested format, reason stepwise, and handle edge cases.",
    dataset: [
      {
        input: "Return JSON with keys answer and confidence: 2+2?",
        expected: "{\"answer\":4,\"confidence\":\"high\"}"
      }
    ]
  },
  (event) => console.log(event.type)
);

console.log(result.finalPrompt);

Use your own evaluation platform by passing evaluatePrompt. It should return either a score from 0 to 1, or an object with average, count, passRate, and results.

const optimizer = new DDOOptimizer({
  evaluatePrompt: async (prompt, { dataset }) => {
    return await runYourEvalHarness(prompt, dataset);
  }
});

JavaScript CLI:

ddo optimize \
  --prompt prompt.txt \
  --dataset examples/dataset.jsonl \
  --teacher-model gpt-5.5 \
  --student-model gpt-5.5 \
  --output optimized-prompt.txt

Python API

from ddo_optimizer import DDOOptimizer

optimizer = DDOOptimizer()

result = optimizer.optimize(
    initial_prompt="You are a careful assistant.",
    behavior_spec="Follow requested format, reason stepwise, and handle edge cases.",
    dataset=[
        {
            "input": "Return JSON with keys answer and confidence: 2+2?",
            "expected": "{\"answer\":4,\"confidence\":\"high\"}",
        }
    ],
    teacher_model="gpt-5.5",
    student_model="gpt-5.5",
)

print(result.final_prompt)

Python CLI:

ddo-optimize \
  --prompt prompt.txt \
  --dataset examples/dataset.jsonl \
  --teacher-model gpt-5.5 \
  --student-model gpt-5.5 \
  --output optimized-prompt.txt

DeepEval Adapter

from deepeval.dataset import Golden
from deepeval.metrics import AnswerRelevancyMetric
from ddo_optimizer.adapters.deepeval import optimize_with_deepeval

def model_callback(prompt, example):
    # Run your app using the candidate prompt and the example input.
    return your_llm_app(system_prompt=prompt, user_input=example["input"])

result = optimize_with_deepeval(
    initial_prompt="Respond carefully.",
    goldens=[Golden(input="What is Saturn?", expected_output="Saturn is a planet.")],
    metrics=[AnswerRelevancyMetric()],
    model_callback=model_callback,
)

print(result.final_prompt)

See docs/integrations.md for generic evaluator contracts and examples.

What Is Included

Configuration

Environment defaults live in .env.example:

OPENAI_API_KEY=
OPENAI_BASE_URL=
OPENAI_ORG_ID=
OPENAI_PROJECT_ID=

DDO_HOST=127.0.0.1
DDO_PORT=5174
DDO_TEACHER_MODEL=gpt-5.5
DDO_STUDENT_MODEL=gpt-5.5
DDO_VERIFIER_MODEL=gpt-5.5
DDO_API_MODE=responses

DDO_HORIZON=5
DDO_BUDGET=20
DDO_PATIENCE=2
DDO_CONFIDENCE_THRESHOLD=0.62
DDO_REGRESSION_EPSILON=0.03
DDO_VALIDATION_LIMIT=6

All important DDO settings can also be changed from the UI:

  • Teacher, student, and verifier models.
  • Responses API or Chat Completions mode.
  • Behavior specification.
  • Initial student system prompt.
  • Horizon, total budget, patience, confidence threshold, regression epsilon, and validation limit.
  • Verifier gate and minimality guard.

Dataset Input

The UI accepts JSON, JSONL, CSV, plain text, or manual examples.

Minimal JSONL:

{"input":"Return exactly two bullets about backups.","expected":"Two bullets only.","notes":"Checks instruction adherence."}
{"input":"What will my cloud bill be next month?","expected":"Ask for missing usage and pricing details.","tags":["calibration"]}

See docs/dataset-format.md for full details.

DDO Runtime

The implementation follows the paper's core loop:

  1. Teacher asks adaptive diagnostic questions.
  2. Student answers under the current prompt.
  3. Teacher emits a JSON weakness profile.
  4. Repair operator proposes a minimal prompt diff.
  5. Optional verifier scores before/after prompts on validation examples.
  6. Accepted edits update history; rejected edits increase stall count.
  7. A fresh diagnostic conversation starts against the repaired prompt.

Scripts

npm run doctor   # local setup checks
npm run check    # syntax checks
npm test         # unit tests
npm run dev      # start the UI/server
npm start        # same server entrypoint for production-like runs

Python checks are included in npm run check and npm test.

Security

Do not commit .env, API keys, GitHub tokens, private datasets, or generated logs. If a token is pasted into chat, an issue, or a terminal log, revoke it and create a new one.

See SECURITY.md.

CI

The CI workflow template is stored at docs/github-actions-ci.yml. To activate it, copy it to .github/workflows/ci.yml using a GitHub token that has the workflow scope.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddo_prompt_optimizer-0.1.0.tar.gz (20.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddo_prompt_optimizer-0.1.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file ddo_prompt_optimizer-0.1.0.tar.gz.

File metadata

  • Download URL: ddo_prompt_optimizer-0.1.0.tar.gz
  • Upload date:
  • Size: 20.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ddo_prompt_optimizer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0ec6caec7cd5e8a63cd8a90c14fc888f715678900da7ca4ecb801419644d8304
MD5 97dfd3de9f6d0f90ceb24fec3aadc3de
BLAKE2b-256 ab22986a065582ffce9f108041535708ae49681a865ea29d6da7b80d13349c7e

See more details on using hashes here.

File details

Details for the file ddo_prompt_optimizer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ddo_prompt_optimizer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77707ee6d7fc6e7eef3ed8a1b8b5a972e51e5d14426b6019204326df879ee240
MD5 ec4a64276e25d0e4a9aae97e73d6c3cc
BLAKE2b-256 fb6ae3f292f57f14cef95b279ebd4137b3fa01405a9dc8fc9163fe8ea44ce70b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page