An adversarial reasoning engine: a pluggable tree-search workflow where thoughts are generated, adversarially evaluated, and synthesized (thesis -> antithesis -> synthesis).
Project description
Dialectica 
English | 简体中文
Dialectica is a pluggable adversarial reasoning engine. It searches a tree of "thoughts" where each thought is generated, adversarially evaluated and iteratively refined, then synthesized into an answer — thesis → antithesis → synthesis (Generator → Discriminator → Synthesizer). Inspired by karpathy/autoresearch's propose→evaluate→keep-best loop and Claude Code's composable workflows, every stage is a swappable component; the default wiring is Tree-of-Thoughts + a GAN-style evaluation loop on Google ADK 2.1.
Install
Use it as a library in your own project:
uv add dialectica
# or: pip install dialectica
import os, asyncio
from dialectica import create_engine
os.environ["GOOGLE_API_KEY"] = "..." # the app owns env setup
async def main():
result = await create_engine("Your problem here").run()
print(result["final_answer"])
asyncio.run(main())
The library reads configuration from os.environ and does not load .env
itself. To work on Dialectica instead, see Development.
Key Features
🧩 Pluggable engine (thesis → antithesis → synthesis)
The Engine owns only the search control flow; every decision is delegated to
an injected component, so any stage can be swapped without touching the engine:
| Stage | Role | Default |
|---|---|---|
Generator |
propose thoughts (thesis) | LlmGenerator |
Evaluator |
critique & refine (antithesis) | AdversarialEvaluator |
Selector |
choose the frontier | BeamSearch |
Synthesizer |
combine into an answer (synthesis) | LlmSynthesizer |
Retarget it at code review, research, or decision-making just by changing the generator's prompts or swapping a stage — see Pluggable Architecture.
🔄 GAN-style adversarial evaluation (keep-best)
Each thought undergoes iterative adversarial refinement rather than a single pass:
- Discriminator scores it with a structured verdict (score, flaws, suggestions)
- Generator refines it from that critique
- Discriminator re-scores
- Loop until the quality threshold, a terminate signal, or
max_gan_rounds
Refinement is not assumed monotonic — the loop keeps the best-scoring round (à la autoresearch's "keep only what beats the current best"), and the node stores that refined text so synthesis works on the improved version, not the original.
🌳 Tree search with merit-based beam
- Strategies are scored before the beam — the frontier reflects merit, not generation order
- Beam search keeps the top-k most promising paths (
BeamSearch, orGreedySearch) - Pruning: paths below threshold are dropped; exploration stops when the beam empties
- Multi-node synthesis: the final answer integrates the top scoring thoughts across branches
📊 Structured evaluation results
The Discriminator returns a DiscriminatorVerdict via ADK output_schema (no
fragile text parsing). The engine wraps it as an EvaluationResult:
score, flaws, suggestions, should_terminate, reasoning,
adversarial_rounds, refined_thought, and the full per-round history.
Architecture
User Problem
↓
Engine — Phase 1: Initialize
↓
Generator expands root → initial strategies
↓ (each strategy scored by the Evaluator before it can enter the beam)
Engine — Phase 2: Explore (beam search)
↓
For each node in the Selector's frontier:
├── Generator expands it into children
└── for each child, Evaluator runs the GAN loop:
├── Discriminator scores (structured verdict)
├── Generator refines from the critique
├── re-score, keep the best round
└── persist the refined thought + score on the node
→ children ≥ threshold form the next beam
↓
Engine — Phase 3: Synthesize
↓
Synthesizer integrates the top thoughts
↓
Final Answer (+ thought_tree, best_path, stats)
Workflow Phases
Phase 1: Initialization
- Creates the root node from the user problem
Generator.expand(root)produces the initial strategies (validated viaThoughtData)- Each strategy is adversarially scored, then the ones clearing the threshold seed the beam (falling back to the Selector's top-k if none clear it)
Phase 2: Exploration (beam search)
Iterates up to max_depth times:
- Select:
Selector.select(...)picks the frontier from the active beam - Generate:
Generator.expand(parent)creates child thoughts - Evaluate:
Evaluator.evaluate(...)runs the GAN loop, keeping the best round and persisting the refined thought - Filter: children scoring ≥
score_thresholdform the next beam
Exploration stops when the beam empties or max_depth is reached.
Phase 3: Synthesis
Synthesizer.synthesize(...)takes the top-scoring evaluated thoughts- Produces a coherent, comprehensive final answer
Development
To work on Dialectica itself (not needed just to use it — see Install):
git clone https://github.com/FradSer/dialectica
cd dialectica
uv sync
cp dialectica/.env.example dialectica/.env # add GOOGLE_API_KEY for the live e2e test
Then run the suite — see Testing.
Configuration
Environment Variables
Model Configuration:
# Default model for all agents
DEFAULT_MODEL_CONFIG=google:gemini-3.5-flash
# Role-specific overrides (optional)
GENERATOR_MODEL_CONFIG=google:gemini-3.1-pro
DISCRIMINATOR_MODEL_CONFIG=google:gemini-3.1-pro
SYNTHESIZER_MODEL_CONFIG=google:gemini-3.1-pro
Supported Providers:
google:gemini-3.5-flash(Google AI Studio)openrouter:anthropic/claude-3.5-sonnet(OpenRouter)openai:gpt-4o(OpenAI)
API Credentials:
# Google AI Studio
GOOGLE_API_KEY=your-key-here
# Or Vertex AI
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_CLOUD_PROJECT=your-project
GOOGLE_CLOUD_LOCATION=us-central1
# OpenRouter
OPENROUTER_API_KEY=sk-or-...
# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_API_BASE=https://api.openai.com/v1
Engine Parameters
engine = create_engine(
problem="Your problem statement",
max_depth=4, # Max tree depth
beam_width=3, # Active paths per iteration
max_gan_rounds=3, # Max adversarial refinement rounds
score_threshold=7.0, # Min score to continue
synthesizer_model=None, # Optional model override
)
Usage Examples
Basic Usage
from dialectica import create_engine
# Create the engine
engine = create_engine(
"Design a sustainable urban transport system"
)
# Run workflow
result = await engine.run()
# Access results
print(result["final_answer"])
print(f"Generated {len(result['thought_tree'])} thoughts")
print(f"Best path: {result['best_path']}")
Inspecting the result
run() returns the answer plus the full search trace:
result = await engine.run()
result["final_answer"] # synthesized answer
result["best_path"] # node ids from root to the highest-scoring thought
result["thought_tree"] # every node, with scores and per-round GAN history
result["stats"] # total_thoughts, max_depth_reached, duration_seconds
Custom Configuration
engine = create_engine(
problem="Optimize supply chain logistics",
max_depth=5,
beam_width=5,
max_gan_rounds=4,
score_threshold=8.0,
synthesizer_model="google:gemini-3.1-pro",
)
Project Structure
dialectica/
├── __init__.py # Public API exports
├── agent.py # Composition root: create_engine() wires defaults
├── coordinator.py # Search engine — orchestrates the pluggable stages
├── protocols.py # Stage interfaces: Generator/Evaluator/Selector/Synthesizer
├── generation.py # LlmGenerator (default Generator) + list parsing
├── gan_evaluator.py # AdversarialEvaluator / SinglePassEvaluator (Evaluator)
├── selection.py # BeamSearch / GreedySearch (Selector)
├── synthesis.py # LlmSynthesizer (default Synthesizer)
├── agent_runtime.py # Single LLM-call seam (run_agent)
├── agent_factory.py # Dynamic agent creation (role templates)
├── models.py # ThoughtData, DiscriminatorVerdict, EvaluationResult
├── llm_config.py # Model configuration factory
└── validation.py # Thought validation utilities
tests/
├── conftest.py # Loads .env for the e2e skip guard
├── helpers.py # Deterministic mock LLM stand-ins
├── test_models.py # Schema / verdict unit tests
├── test_generation.py # List parsing + generator prompt routing
├── test_gan_evaluator.py # GAN loop + single-pass evaluator (mocked LLM)
├── test_coordinator.py # Engine control flow (injected fake stages)
├── test_default_pipeline.py # Default composition integration (mocked LLM)
└── test_e2e_live.py # Real Gemini E2E (marked `e2e`)
Testing
The suite has two tiers:
- Mocked tests (default) — fast, deterministic, no API key. They replace the LLM call seam with stand-ins and exercise the real orchestration: beam search, the GAN refinement loop, pruning, and synthesis.
- Live E2E (
@pytest.mark.e2e) — drives the full workflow against the real Gemini API. Deselected by default and auto-skipped whenGOOGLE_API_KEYis unset (loaded fromdialectica/.env).
uv run pytest # mocked tests only (seconds, no key)
uv run pytest -m e2e # live API E2E (slower, requires GOOGLE_API_KEY)
Pluggable Architecture
The Coordinator owns only the search control flow. Every decision is
delegated to an injected component, so any stage can be swapped without
touching the engine — the engine is a general-purpose reasoning workflow, and
ToT + GAN is just the default wiring.
| Protocol | Responsibility | Default | Alternatives |
|---|---|---|---|
Generator |
expand a node into candidate thoughts | LlmGenerator |
custom prompts/agent |
Evaluator |
score (and optionally refine) a thought | AdversarialEvaluator (GAN loop) |
SinglePassEvaluator (cheap) |
Selector |
choose the next search frontier | BeamSearch(width) |
GreedySearch |
Synthesizer |
combine thoughts into the answer | LlmSynthesizer |
custom |
create_engine(...) wires the defaults. To customize, build the
components yourself and construct Coordinator directly:
from dialectica import (
Coordinator, BeamSearch, SinglePassEvaluator, LlmSynthesizer,
)
from dialectica.agent import build_default_components
# Start from the defaults, then swap a stage:
generator, _evaluator, _selector, synthesizer = build_default_components()
from dialectica.agent_factory import create_agent
from dialectica.models import DiscriminatorVerdict
discriminator = create_agent(
role="Discriminator", role_name="Discriminator", output_schema=DiscriminatorVerdict
)
engine = Coordinator(
problem="...",
generator=generator,
evaluator=SinglePassEvaluator(discriminator), # cheaper: no refinement loop
selector=BeamSearch(width=5), # wider frontier
synthesizer=synthesizer,
max_depth=3,
score_threshold=7.0,
)
result = await engine.run()
Any object implementing a protocol's method works (they are
typing.Protocol, so no subclassing needed) — e.g. a non-LLM heuristic
Evaluator, or a Selector that keeps a diverse frontier instead of pure
top-k.
Key Components
Coordinator
Orchestrates the three-phase workflow against the stage protocols:
- Initialize → Explore → Synthesize
- Manages the thought tree and active beam
- Delegates generation, scoring, selection, and synthesis to injected components
AgentFactory
Creates agents from role templates:
- Standardized system prompts
- Tool configuration per role
- Model configuration per role
- Runtime agent instantiation
AdversarialEvaluator
Implements GAN-style evaluation:
- Generator proposes/refines thoughts
- Discriminator critiques with feedback
- Iterative refinement loop
- Structured evaluation results
ThoughtData Model
Validates thought structure:
- Required fields (id, parent_id, depth, content)
- Optional evaluation data
- GAN round tracking
- Evaluation history
Performance Considerations
Token Consumption:
- GAN evaluation: 2-6 LLM calls per thought (depending on rounds)
- Beam search: beam_width × max_depth iterations
- Typical problem: 50-200 thoughts, 200-800 LLM calls
Optimization Strategies:
- Reduce
max_gan_roundsto 1-2 for faster execution - Raise
score_thresholdto prune harder; lower it to explore more paths - Narrow the beam (
beam_width) or useGreedySearchto cut fan-out - Use a lighter model for the Generator and a stronger one for the Discriminator
- Swap in
SinglePassEvaluatorto skip the refinement loop entirely
Troubleshooting
Import Errors
# Ensure Python 3.11+
python --version
# Reinstall dependencies
rm -rf .venv
uv sync
ADK Version Mismatch
# Check installed version
uv pip show google-adk
# Should show 2.1.0 or higher
API Key Issues
# Test Google AI Studio
export GOOGLE_API_KEY=your-key
uv run python -c "from dialectica import create_engine; print('OK')"
Contributing
Contributions welcome! Areas of interest:
- New stage implementations (
Generator/Evaluator/Selector/Synthesizer) - Alternative search/selection policies (e.g. diversity-preserving frontiers)
- Performance optimizations
- Documentation improvements
- Test coverage
License
References
- karpathy/autoresearch — propose → evaluate → keep-best loop
- Google ADK Documentation
- Tree of Thoughts Paper
Acknowledgments
Built with Google ADK, inspired by Tree of Thoughts research, karpathy/autoresearch's autonomous keep-best loop, and Claude Code's composable workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dialectica-0.3.1.tar.gz.
File metadata
- Download URL: dialectica-0.3.1.tar.gz
- Upload date:
- Size: 189.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d97529631daa0499014fc053d366e498f3d54eb4badf5ef91f1dc13045acf45f
|
|
| MD5 |
09a626f48e065c571b53479cf5ad52bd
|
|
| BLAKE2b-256 |
097c9b27fc41e6ee05d626aae002304783ed5756d91617a691efa85c51fbdfe5
|
File details
Details for the file dialectica-0.3.1-py3-none-any.whl.
File metadata
- Download URL: dialectica-0.3.1-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb50e77544a38f1894a240c1b3fc9d214a07f3f614cd1b3b9991b5f040c9b6fa
|
|
| MD5 |
bfecc11938ddf90852da42212426ebdf
|
|
| BLAKE2b-256 |
849a745c1002ef98bd3fc1b62a6f14f33e45727cfea2fbebe46703d5a50a3f90
|