Skip to main content

An adversarial reasoning engine: a pluggable tree-search workflow where thoughts are generated, adversarially evaluated, and synthesized (thesis -> antithesis -> synthesis).

Project description

Dialectica

PyPI Twitter Follow Python Version Framework Evaluation

English | 简体中文

Dialectica is a pluggable adversarial reasoning engine. It searches a tree of "thoughts" where each thought is generated, adversarially evaluated and iteratively refined, then synthesized into an answer — thesis → antithesis → synthesis (Generator → Discriminator → Synthesizer). Inspired by karpathy/autoresearch's propose→evaluate→keep-best loop and Claude Code's composable workflows, every stage is a swappable component; the default wiring is Tree-of-Thoughts + a GAN-style evaluation loop on Google ADK 2.1.

Install

Use it as a library in your own project:

uv add dialectica
# or: pip install dialectica
import os, asyncio
from dialectica import create_engine

os.environ["GOOGLE_API_KEY"] = "..."          # the app owns env setup

async def main():
    result = await create_engine("Your problem here").run()
    print(result["final_answer"])

asyncio.run(main())

The library reads configuration from os.environ and does not load .env itself. To work on Dialectica instead, see Development.

Key Features

🧩 Pluggable engine (thesis → antithesis → synthesis)

The Engine owns only the search control flow; every decision is delegated to an injected component, so any stage can be swapped without touching the engine:

Stage Role Default
Generator propose thoughts (thesis) LlmGenerator
Evaluator critique & refine (antithesis) AdversarialEvaluator
Selector choose the frontier BeamSearch
Synthesizer combine into an answer (synthesis) LlmSynthesizer

Retarget it at code review, research, or decision-making just by changing the generator's prompts or swapping a stage — see Pluggable Architecture.

🔄 GAN-style adversarial evaluation (keep-best)

Each thought undergoes iterative adversarial refinement rather than a single pass:

  1. Discriminator scores it with a structured verdict (score, flaws, suggestions)
  2. Generator refines it from that critique
  3. Discriminator re-scores
  4. Loop until the quality threshold, a terminate signal, or max_gan_rounds

Refinement is not assumed monotonic — the loop keeps the best-scoring round (à la autoresearch's "keep only what beats the current best"), and the node stores that refined text so synthesis works on the improved version, not the original.

🌳 Tree search with merit-based beam

  • Strategies are scored before the beam — the frontier reflects merit, not generation order
  • Beam search keeps the top-k most promising paths (BeamSearch, or GreedySearch)
  • Pruning: paths below threshold are dropped; exploration stops when the beam empties
  • Multi-node synthesis: the final answer integrates the top scoring thoughts across branches

📊 Structured evaluation results

The Discriminator returns a DiscriminatorVerdict via ADK output_schema (no fragile text parsing). The engine wraps it as an EvaluationResult: score, flaws, suggestions, should_terminate, reasoning, adversarial_rounds, refined_thought, and the full per-round history.

Architecture

User Problem
    ↓
Engine — Phase 1: Initialize
    ↓
Generator expands root → initial strategies
    ↓ (each strategy scored by the Evaluator before it can enter the beam)
Engine — Phase 2: Explore (beam search)
    ↓
For each node in the Selector's frontier:
    ├── Generator expands it into children
    └── for each child, Evaluator runs the GAN loop:
        ├── Discriminator scores (structured verdict)
        ├── Generator refines from the critique
        ├── re-score, keep the best round
        └── persist the refined thought + score on the node
    → children ≥ threshold form the next beam
    ↓
Engine — Phase 3: Synthesize
    ↓
Synthesizer integrates the top thoughts
    ↓
Final Answer (+ thought_tree, best_path, stats)

Workflow Phases

Phase 1: Initialization

  • Creates the root node from the user problem
  • Generator.expand(root) produces the initial strategies (validated via ThoughtData)
  • Each strategy is adversarially scored, then the ones clearing the threshold seed the beam (falling back to the Selector's top-k if none clear it)

Phase 2: Exploration (beam search)

Iterates up to max_depth times:

  1. Select: Selector.select(...) picks the frontier from the active beam
  2. Generate: Generator.expand(parent) creates child thoughts
  3. Evaluate: Evaluator.evaluate(...) runs the GAN loop, keeping the best round and persisting the refined thought
  4. Filter: children scoring ≥ score_threshold form the next beam

Exploration stops when the beam empties or max_depth is reached.

Phase 3: Synthesis

  • Synthesizer.synthesize(...) takes the top-scoring evaluated thoughts
  • Produces a coherent, comprehensive final answer

Development

To work on Dialectica itself (not needed just to use it — see Install):

git clone https://github.com/FradSer/dialectica
cd dialectica
uv sync
cp dialectica/.env.example dialectica/.env   # add GOOGLE_API_KEY for the live e2e test

Then run the suite — see Testing.

Configuration

Environment Variables

Model Configuration:

# Default model for all agents
DEFAULT_MODEL_CONFIG=google:gemini-3.5-flash

# Role-specific overrides (optional)
GENERATOR_MODEL_CONFIG=google:gemini-3.1-pro
DISCRIMINATOR_MODEL_CONFIG=google:gemini-3.1-pro
SYNTHESIZER_MODEL_CONFIG=google:gemini-3.1-pro

Supported Providers:

  • google:gemini-3.5-flash (Google AI Studio)
  • openrouter:anthropic/claude-3.5-sonnet (OpenRouter)
  • openai:gpt-4o (OpenAI)

API Credentials:

# Google AI Studio
GOOGLE_API_KEY=your-key-here

# Or Vertex AI
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_CLOUD_PROJECT=your-project
GOOGLE_CLOUD_LOCATION=us-central1

# OpenRouter
OPENROUTER_API_KEY=sk-or-...

# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=https://api.openai.com/v1

Engine Parameters

engine = create_engine(
    problem="Your problem statement",
    max_depth=4,              # Max tree depth
    beam_width=3,             # Active paths per iteration
    max_gan_rounds=3,         # Max adversarial refinement rounds
    score_threshold=7.0,      # Min score to continue
    synthesizer_model=None,   # Optional model override
)

Usage Examples

Basic Usage

from dialectica import create_engine

# Create the engine
engine = create_engine(
    "Design a sustainable urban transport system"
)

# Run workflow
result = await engine.run()

# Access results
print(result["final_answer"])
print(f"Generated {len(result['thought_tree'])} thoughts")
print(f"Best path: {result['best_path']}")

Inspecting the result

run() returns the answer plus the full search trace:

result = await engine.run()
result["final_answer"]   # synthesized answer
result["best_path"]      # node ids from root to the highest-scoring thought
result["thought_tree"]   # every node, with scores and per-round GAN history
result["stats"]          # total_thoughts, max_depth_reached, duration_seconds

Custom Configuration

engine = create_engine(
    problem="Optimize supply chain logistics",
    max_depth=5,
    beam_width=5,
    max_gan_rounds=4,
    score_threshold=8.0,
    synthesizer_model="google:gemini-3.1-pro",
)

Project Structure

dialectica/
├── __init__.py           # Public API exports
├── agent.py              # Composition root: create_engine() wires defaults
├── coordinator.py        # Search engine — orchestrates the pluggable stages
├── protocols.py          # Stage interfaces: Generator/Evaluator/Selector/Synthesizer
├── generation.py         # LlmGenerator (default Generator) + list parsing
├── gan_evaluator.py      # AdversarialEvaluator / SinglePassEvaluator (Evaluator)
├── selection.py          # BeamSearch / GreedySearch (Selector)
├── synthesis.py          # LlmSynthesizer (default Synthesizer)
├── agent_runtime.py      # Single LLM-call seam (run_agent)
├── agent_factory.py      # Dynamic agent creation (role templates)
├── models.py             # ThoughtData, DiscriminatorVerdict, EvaluationResult
├── llm_config.py         # Model configuration factory
└── validation.py         # Thought validation utilities
tests/
├── conftest.py           # Loads .env for the e2e skip guard
├── helpers.py            # Deterministic mock LLM stand-ins
├── test_models.py        # Schema / verdict unit tests
├── test_generation.py    # List parsing + generator prompt routing
├── test_gan_evaluator.py # GAN loop + single-pass evaluator (mocked LLM)
├── test_coordinator.py   # Engine control flow (injected fake stages)
├── test_default_pipeline.py  # Default composition integration (mocked LLM)
└── test_e2e_live.py      # Real Gemini E2E (marked `e2e`)

Testing

The suite has two tiers:

  • Mocked tests (default) — fast, deterministic, no API key. They replace the LLM call seam with stand-ins and exercise the real orchestration: beam search, the GAN refinement loop, pruning, and synthesis.
  • Live E2E (@pytest.mark.e2e) — drives the full workflow against the real Gemini API. Deselected by default and auto-skipped when GOOGLE_API_KEY is unset (loaded from dialectica/.env).
uv run pytest          # mocked tests only (seconds, no key)
uv run pytest -m e2e   # live API E2E (slower, requires GOOGLE_API_KEY)

Pluggable Architecture

The Coordinator owns only the search control flow. Every decision is delegated to an injected component, so any stage can be swapped without touching the engine — the engine is a general-purpose reasoning workflow, and ToT + GAN is just the default wiring.

Protocol Responsibility Default Alternatives
Generator expand a node into candidate thoughts LlmGenerator custom prompts/agent
Evaluator score (and optionally refine) a thought AdversarialEvaluator (GAN loop) SinglePassEvaluator (cheap)
Selector choose the next search frontier BeamSearch(width) GreedySearch
Synthesizer combine thoughts into the answer LlmSynthesizer custom

create_engine(...) wires the defaults. To customize, build the components yourself and construct Coordinator directly:

from dialectica import (
    Coordinator, BeamSearch, SinglePassEvaluator, LlmSynthesizer,
)
from dialectica.agent import build_default_components

# Start from the defaults, then swap a stage:
generator, _evaluator, _selector, synthesizer = build_default_components()
from dialectica.agent_factory import create_agent
from dialectica.models import DiscriminatorVerdict

discriminator = create_agent(
    role="Discriminator", role_name="Discriminator", output_schema=DiscriminatorVerdict
)

engine = Coordinator(
    problem="...",
    generator=generator,
    evaluator=SinglePassEvaluator(discriminator),   # cheaper: no refinement loop
    selector=BeamSearch(width=5),                    # wider frontier
    synthesizer=synthesizer,
    max_depth=3,
    score_threshold=7.0,
)
result = await engine.run()

Any object implementing a protocol's method works (they are typing.Protocol, so no subclassing needed) — e.g. a non-LLM heuristic Evaluator, or a Selector that keeps a diverse frontier instead of pure top-k.

Key Components

Coordinator

Orchestrates the three-phase workflow against the stage protocols:

  • Initialize → Explore → Synthesize
  • Manages the thought tree and active beam
  • Delegates generation, scoring, selection, and synthesis to injected components

AgentFactory

Creates agents from role templates:

  • Standardized system prompts
  • Tool configuration per role
  • Model configuration per role
  • Runtime agent instantiation

AdversarialEvaluator

Implements GAN-style evaluation:

  • Generator proposes/refines thoughts
  • Discriminator critiques with feedback
  • Iterative refinement loop
  • Structured evaluation results

ThoughtData Model

Validates thought structure:

  • Required fields (id, parent_id, depth, content)
  • Optional evaluation data
  • GAN round tracking
  • Evaluation history

Performance Considerations

Token Consumption:

  • GAN evaluation: 2-6 LLM calls per thought (depending on rounds)
  • Beam search: beam_width × max_depth iterations
  • Typical problem: 50-200 thoughts, 200-800 LLM calls

Optimization Strategies:

  • Reduce max_gan_rounds to 1-2 for faster execution
  • Raise score_threshold to prune harder; lower it to explore more paths
  • Narrow the beam (beam_width) or use GreedySearch to cut fan-out
  • Use a lighter model for the Generator and a stronger one for the Discriminator
  • Swap in SinglePassEvaluator to skip the refinement loop entirely

Troubleshooting

Import Errors

# Ensure Python 3.11+
python --version

# Reinstall dependencies
rm -rf .venv
uv sync

ADK Version Mismatch

# Check installed version
uv pip show google-adk

# Should show 2.1.0 or higher

API Key Issues

# Test Google AI Studio
export GOOGLE_API_KEY=your-key
uv run python -c "from dialectica import create_engine; print('OK')"

Contributing

Contributions welcome! Areas of interest:

  • New stage implementations (Generator / Evaluator / Selector / Synthesizer)
  • Alternative search/selection policies (e.g. diversity-preserving frontiers)
  • Performance optimizations
  • Documentation improvements
  • Test coverage

License

MIT

References

Acknowledgments

Built with Google ADK, inspired by Tree of Thoughts research, karpathy/autoresearch's autonomous keep-best loop, and Claude Code's composable workflows.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dialectica-0.3.1.tar.gz (189.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dialectica-0.3.1-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file dialectica-0.3.1.tar.gz.

File metadata

  • Download URL: dialectica-0.3.1.tar.gz
  • Upload date:
  • Size: 189.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.10

File hashes

Hashes for dialectica-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d97529631daa0499014fc053d366e498f3d54eb4badf5ef91f1dc13045acf45f
MD5 09a626f48e065c571b53479cf5ad52bd
BLAKE2b-256 097c9b27fc41e6ee05d626aae002304783ed5756d91617a691efa85c51fbdfe5

See more details on using hashes here.

File details

Details for the file dialectica-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: dialectica-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.10

File hashes

Hashes for dialectica-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bb50e77544a38f1894a240c1b3fc9d214a07f3f614cd1b3b9991b5f040c9b6fa
MD5 bfecc11938ddf90852da42212426ebdf
BLAKE2b-256 849a745c1002ef98bd3fc1b62a6f14f33e45727cfea2fbebe46703d5a50a3f90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page