Train a Double DQN router that selects an optimal subset of agents for any multi-agent system.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kirmoz

These details have not been verified by PyPI

Project links

Research

Project description

ddqn-router

A lightweight Python library that trains a Double DQN agent to route user queries to the optimal subset of specialized agents in a multi-agent system.

You define your agents, label a dataset with an LLM, train the router, and get a fast inference model that selects the right combination of agents for any input query — no LLM needed at inference time.

Why DDQN Routing?

Most multi-agent systems route queries using one of three approaches: hard-coded rules, a classifier, or an LLM call. Each has serious drawbacks at scale. DDQN routing addresses all of them.

Cost

LLM-based routing costs money on every request. Even a cheap model like gpt-4o-mini at $0.15/1M input tokens adds up: 1M routing decisions per month costs ~$150 in API calls alone (assuming ~1000 tokens per routing prompt). A DDQN router runs locally for $0 after training — the model is a ~300KB PyTorch file doing matrix multiplications on CPU.

For comparison, training the DDQN itself requires labeling a dataset (~500-2000 examples) with a single LLM pass. That one-time cost is typically $0.50-$5.00 total. The router then serves unlimited requests at zero marginal cost.

Speed

Method	Latency per query	Where it runs
LLM routing (GPT-4o-mini)	300-800ms	Remote API
LLM routing (local Ollama)	50-200ms	Local GPU
Classifier (TF-IDF + LogReg)	~1ms	CPU
DDQN router	~1ms	CPU

DDQN inference is a sequence of small MLP forward passes (one per routing step, typically 2-4 steps). Each forward pass processes a vector of a few thousand floats through two hidden layers — this takes microseconds on any modern CPU. Total end-to-end latency including TF-IDF encoding is under 1ms.

This is 300-800x faster than an LLM API call, and it scales linearly with request volume without any rate limits, API keys, or network dependency.

Quality

In the research that this library is based on, DDQN routing was benchmarked against four alternatives on a multi-agent customer support system with 10 agents:

Method	Jaccard	F1	Success Rate
Random	0.153	0.221	0.060
Rule-based (keyword)	0.287	0.369	0.120
Supervised (TF-IDF + LogReg)	0.512	0.621	0.340
LLM (GPT-4o-mini)	0.589	0.685	0.410
DDQN (this library)	0.631	0.732	0.470

Key observations from the research:

DDQN outperformed the LLM baseline by +4.2pp Jaccard and +4.7pp F1, while being 300x+ faster and free at inference time.
Action masking was critical — it prevents the model from re-selecting agents it already picked, reducing wasted exploration and improving convergence speed by ~40%.
The Jaccard-based reward with step cost (0.05 penalty per selection) produced the best balance between precision and recall. Without step cost, the model tends to over-select agents.
The model learns a genuine multi-step policy: it considers the agents it has already selected before deciding the next one, rather than making independent per-agent decisions like a classifier.

Why Double DQN specifically?

Standard DQN suffers from Q-value overestimation — it uses the same network to both select and evaluate actions, which creates a positive bias. Double DQN (van Hasselt et al., 2016) fixes this by using the online network to select the best action, but the target network to evaluate its value. In routing tasks where the reward signal is sparse (only at episode end), this correction is particularly important because overestimated Q-values can cause the agent to stop too early or too late.

When To Use ddqn-router

Good fit

You have a multi-agent system with 3+ specialized agents and need to route incoming queries to one or more of them.
You need subset routing — a single query may require multiple agents working together (e.g., a billing issue that also needs account verification).
You have enough data to label — 500+ representative queries is a practical minimum; 2000+ is ideal. The labeling is done once via LLM.
You want fast, free inference — after training, routing is a local forward pass with no API calls.
Your query distribution is relatively stable — the TF-IDF encoder works best when the vocabulary and topics don't change dramatically after training.
You want reproducibility and explainability — router.explain() shows exactly which agents the model considered and why, step by step.

Not a good fit

You have fewer than ~300 labeled examples. The DDQN needs enough data to learn the reward structure. With very small datasets, a simple keyword matcher or direct LLM call will work better.
You only need single-agent routing (each query goes to exactly one agent). A standard multi-class classifier is simpler and equally effective for this case.
Your agent set changes frequently (weekly or more). Each change requires retraining. If you add/remove agents often, an LLM-based router that reads agent descriptions dynamically may be more practical.
You need zero-shot generalization to completely new query types not represented in training data. DDQN generalizes within the distribution it was trained on; for truly novel inputs, an LLM has broader coverage.
Your routing depends on conversation history or user context, not just the current query text. This library routes based on single query text only (TF-IDF features). If you need multi-turn context, you'd need to extend the state representation.

Comparison summary

Criterion	DDQN Router	LLM Router	Classifier
Inference cost	Free	$0.15+ / 1M queries	Free
Latency	~1ms	300-800ms	~1ms
Subset routing (multi-agent)	Native	Via prompting	Via multi-label
Handles new agent types	Retrain needed	Zero-shot	Retrain needed
Explainability	Q-value table per step	Token log-probs (limited)	Feature weights
Min dataset size	~500 examples	0 (zero-shot)	~200 examples
Quality on trained distribution	High	High	Medium

Quickstart

1. Install

pip install ddqn-router

Or install with the optional FastAPI server:

pip install ddqn-router[serve]

For development (from source):

git clone https://github.com/kirmoz1997/ddqn-router
cd ddqn-router
pip install -e ".[dev]"

2. Define your agents

Create a config.yaml:

agents:
  - id: 0
    name: "Billing Agent"
    description: "Handles billing, invoices, payments, subscriptions"
  - id: 1
    name: "Technical Agent"
    description: "Handles bugs, errors, API integration issues"
  - id: 2
    name: "Account Agent"
    description: "Handles account settings, passwords, permissions"

labeler:
  model: "gpt-4o-mini"

dataset:
  input: "./data/tasks.jsonl"

output_dir: "./artifacts/"

3. Label your data

Write your raw queries to a text file (one per line), then label them with an LLM:

export DDQN_ROUTER_API_KEY=sk-...

ddqn-router label \
  --config config.yaml \
  --input queries.txt \
  --output data/tasks.jsonl

The labeler calls any OpenAI-compatible API (OpenAI, DeepSeek, Ollama, etc.) and produces a JSONL dataset where each line maps a query to the agents required to handle it:

{"id": "ex_a1b2c3d4", "text": "...", "required_agents": [0, 2]}

You can also create this file manually or with your own labeling pipeline — just follow the same JSONL format.

4. Split and train

ddqn-router dataset split --input data/tasks.jsonl
ddqn-router train --config config.yaml

Training outputs are saved to ./artifacts/:

File	Content
`model.pt`	Trained Q-network weights
`encoder.joblib`	Fitted TF-IDF encoder
`config_used.json`	Exact config snapshot for reproducibility
`metrics_val_best.json`	Best validation metrics
`metrics_test.json`	Final test set metrics
`training_log.jsonl`	Step-by-step training log

5. Use the router

from ddqn_router import DDQNRouter

router = DDQNRouter.load("./artifacts/")

result = router.route("my invoice was charged twice")
print(result.agents)       # [0, 2]
print(result.agent_names)  # ["Billing Agent", "Account Agent"]
print(result.confidence)   # 0.87
print(result.steps)        # 3

# Batch routing
results = router.route_batch(["query one", "query two"])

# See what the model is thinking
router.explain("debug the webhook integration")

CLI Reference

`ddqn-router label`

Label raw queries with required agents using an LLM.

ddqn-router label --config CONFIG [OPTIONS]

Flag	Description	Default
`--config`	Path to YAML config (required)	—
`--input`	Path to raw texts file	from config
`--output`	Output tasks.jsonl path	`./data/tasks.jsonl`
`--model`	LLM model string	`gpt-4o-mini`
`--base-url`	API base URL	`https://api.openai.com/v1`
`--api-key`	API key (or `DDQN_ROUTER_API_KEY` env var)	—
`--min-agents`	Min agents per example	`2`
`--max-agents`	Max agents per example	all
`--prompt-template`	Custom Jinja2 prompt file	built-in
`--batch-size`	Examples per API call	`1`
`--cache`	Cache file path	`./cache/label_cache.jsonl`
`--fallback-strategy`	On LLM parse failure: `skip` / `keyword` / `all-agents`	`keyword`

The labeler uses raw httpx — no OpenAI SDK required. Any provider that exposes POST /chat/completions works: OpenAI, Azure, DeepSeek, Anthropic via proxy, local Ollama, vLLM, etc.

`ddqn-router dataset stats`

ddqn-router dataset stats --input data/tasks.jsonl

Prints total examples, per-agent frequency, and set size distribution.

`ddqn-router dataset split`

ddqn-router dataset split --input data/tasks.jsonl \
  [--train 0.7] [--val 0.15] [--test 0.15] [--output-dir data/]

Stratified split by set size into train.jsonl, val.jsonl, test.jsonl.

`ddqn-router train`

ddqn-router train --config config.yaml [--output-dir ./artifacts/]

Trains the DDQN routing agent. Prints live progress with step count, epsilon, loss, and validation Jaccard.

`ddqn-router serve`

ddqn-router serve [--artifacts ./artifacts/] [--host 0.0.0.0] [--port 8000] [--cors '*']

Starts a FastAPI server for routing inference. Requires the serve extras:

pip install ddqn-router[serve]

Flag	Description	Default
`--artifacts`	Path to trained model artifacts	`./artifacts/`
`--host`	Bind host	`0.0.0.0`
`--port`	Bind port	`8000`
`--cors`	Allowed CORS origins (comma-separated, or `*` for all)	disabled

Endpoints:

Method	Path	Description
`POST`	`/route`	Route a single query: `{"query": "..."}`
`POST`	`/route/batch`	Route multiple queries: `{"queries": ["...", "..."]}`
`GET`	`/health`	Liveness check
`GET`	`/agents`	List configured agents

Example:

ddqn-router serve --artifacts ./artifacts/ --port 8000 --cors '*'

# In another terminal:
curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"query": "my invoice was charged twice"}'

# Batch:
curl -X POST http://localhost:8000/route/batch \
  -H "Content-Type: application/json" \
  -d '{"queries": ["fix the API bug", "export data to CSV"]}'

You can also use create_app() directly for custom deployment (e.g., with Gunicorn):

from ddqn_router.serve.app import create_app

app = create_app("./artifacts/", cors_origins=["https://myapp.com"])

Python API

from ddqn_router import DDQNRouter, RouteResult, RouterNotTrainedError

`DDQNRouter.load(artifacts_path) -> DDQNRouter`

Load a trained router. Raises RouterNotTrainedError with step-by-step setup instructions if artifacts are missing.

`router.route(query) -> RouteResult`

Route a single query. Returns:

RouteResult(
    agents=[0, 2],              # selected agent IDs
    agent_names=["Billing Agent", "Account Agent"],
    confidence=0.87,            # 0.0 to 1.0
    steps=3,                    # routing steps taken
)

`router.route_batch(queries) -> list[RouteResult]`

Route multiple queries at once.

`router.explain(query) -> None`

Print a step-by-step table showing Q-values for every agent at each routing step — useful for debugging and understanding model behavior.

`router.agents -> list[dict]`

Returns the list of configured agents with id, name, and description.

Configuration Reference

All parameters live in a single YAML file. Defaults come from the best research configuration and work well out of the box — you typically only need to define your agents.

Agents

Field	Type	Description
`agents[].id`	int	Unique agent ID (0-indexed)
`agents[].name`	str	Human-readable name
`agents[].description`	str	What this agent handles (used by labeler and TF-IDF)

Labeler

Field	Type	Default	Description
`labeler.model`	str	`gpt-4o-mini`	LLM model string
`labeler.base_url`	str	`https://api.openai.com/v1`	API base URL
`labeler.api_key`	str	`""`	API key (prefer `DDQN_ROUTER_API_KEY` env var)
`labeler.input`	str	`""`	Path to raw texts
`labeler.output`	str	`./data/tasks.jsonl`	Output labeled dataset
`labeler.min_agents`	int	`2`	Min agents per example
`labeler.max_agents`	int\|null	`null`	Max agents (null = no limit)
`labeler.prompt_template`	str\|null	`null`	Custom Jinja2 prompt path
`labeler.prompt_version`	str	`v1`	Version tag for cache invalidation
`labeler.batch_size`	int	`1`	Examples per API call
`labeler.cache`	str	`./cache/label_cache.jsonl`	Cache file path
`labeler.fallback_strategy`	str	`keyword`	Fallback on LLM parse failure

Dataset

Field	Type	Default	Description
`dataset.input`	str	`./data/tasks.jsonl`	Path to labeled dataset
`dataset.train_ratio`	float	`0.7`	Train split ratio
`dataset.val_ratio`	float	`0.15`	Validation split ratio
`dataset.test_ratio`	float	`0.15`	Test split ratio
`dataset.output_dir`	str	`./data/`	Where to save split files

Training

Field	Type	Default	Description
`training.total_steps`	int	`200000`	Total training steps
`training.batch_size`	int	`64`	Replay sample batch size
`training.learning_rate`	float	`0.001`	Adam optimizer learning rate
`training.gamma`	float	`0.99`	Discount factor
`training.epsilon_start`	float	`1.0`	Initial exploration rate
`training.epsilon_end`	float	`0.05`	Final exploration rate
`training.epsilon_decay_steps`	int	`100000`	Steps over which epsilon decays
`training.target_update_freq`	int	`500`	Steps between target network syncs
`training.replay_buffer_size`	int	`50000`	Replay buffer capacity
`training.min_replay_size`	int	`1000`	Min buffer fill before training starts
`training.reward_mode`	str	`jaccard`	`"jaccard"` or `"stochastic"`
`training.step_cost`	float	`0.05`	Per-agent selection penalty
`training.hidden_layers`	list[int]	`[256, 128]`	Q-network hidden layer sizes
`training.tfidf_max_features`	int	`5000`	TF-IDF vocabulary limit
`training.action_masking`	bool	`true`	Mask already-selected agents
`training.seed`	int	`42`	Random seed for reproducibility
`training.val_eval_freq`	int	`5000`	Steps between validation evaluations
`training.save_best`	bool	`true`	Save best checkpoint by val Jaccard
`training.max_steps_per_episode`	int	`20`	Max routing steps per episode

Output

Field	Type	Default	Description
`output_dir`	str	`./artifacts/`	Where to save trained model artifacts

How It Works

State: Each query is encoded as a TF-IDF vector concatenated with a binary mask of already-selected agents.
Actions: The agent can select any not-yet-selected agent, or choose STOP (action ID = N) to finish.
Reward: Jaccard similarity between selected agents and the ground-truth set, minus a small step cost per selection.
Action masking: Already-selected agents get Q-value = -inf, preventing redundant picks and speeding up training.
Double DQN: Reduces Q-value overestimation — the online network selects actions, the target network evaluates them.

At inference, the trained model runs a greedy forward pass in ~1ms, selecting agents one by one until it triggers STOP.

Project Structure

ddqn_router/
├── __init__.py              # DDQNRouter, RouteResult, RouterNotTrainedError
├── cli.py                   # Typer CLI (label, dataset, train, serve)
├── config.py                # Pydantic config schema with all defaults
├── agents.py                # AgentRegistry
├── labeler/
│   ├── labeler.py           # LLMLabeler (httpx-based, any provider)
│   ├── prompt_template.j2   # Default recall-biased prompt
│   └── cache.py             # JSONL cache (SHA256-keyed)
├── dataset/
│   ├── dataset.py           # Load / validate / stats for tasks.jsonl
│   └── splitter.py          # Stratified train/val/test split
├── env/
│   └── routing_env.py       # Custom MDP environment (no gymnasium)
├── rl/
│   ├── q_network.py         # MLP Q-network (PyTorch)
│   ├── state_encoder.py     # TF-IDF encoder wrapper
│   ├── replay_buffer.py     # Uniform replay buffer
│   ├── ddqn_agent.py        # Double DQN training loop
│   └── reward.py            # Jaccard + stochastic reward
├── eval/
│   └── evaluator.py         # Metrics (Jaccard, P/R/F1, bucketed)
├── inference/
│   └── router.py            # DDQNRouter.load() + route() + explain()
└── serve/
    └── app.py               # Optional FastAPI server (pip install ddqn-router[serve])

Research Background

Based on "Multi-Agent Set Routing with Double DQN". All default hyperparameters in this library come directly from the best-performing experiment in the research (iteration 9: reward_mode=jaccard, step_cost=0.05, action_masking=true, gamma=0.99, hidden layers [256, 128], 200k training steps).

Publishing

Release process (maintainer)

Bump version in pyproject.toml and ddqn_router/__init__.py
Update CHANGELOG.md
Commit: git commit -am "chore: bump version to X.Y.Z"
Tag: git tag vX.Y.Z && git push origin main --tags
GitHub Actions publishes to PyPI automatically via OIDC trusted publishing

Local build (for testing before release)

pip install build twine
python -m build
twine check dist/*
# Optional: upload to TestPyPI first
twine upload --repository testpypi dist/*

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

kirmoz

These details have not been verified by PyPI

Project links

Research

Release history Release notifications | RSS feed

This version

0.2.0

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ddqn_router-0.2.0.tar.gz (33.5 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ddqn_router-0.2.0-py3-none-any.whl (31.7 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file ddqn_router-0.2.0.tar.gz.

File metadata

Download URL: ddqn_router-0.2.0.tar.gz
Upload date: Apr 7, 2026
Size: 33.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ddqn_router-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`bef01369a4606f9b16fb9cf3b554ef9291550602a2b39257a83cf77789aefaf3`
MD5	`49c9ba6d17c1d7d1c826000aa73d3a79`
BLAKE2b-256	`036ad97e48644b513a4220502b7520fb03b729de2d7ce46f824d979663a85abe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ddqn_router-0.2.0.tar.gz:

Publisher: publish.yml on kirmoz1997/ddqn-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ddqn_router-0.2.0.tar.gz
- Subject digest: bef01369a4606f9b16fb9cf3b554ef9291550602a2b39257a83cf77789aefaf3
- Sigstore transparency entry: 1247215323
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: kirmoz1997/ddqn-router@b19040b72f3db9239ad40959793b1799c25ac04b
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/kirmoz1997
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b19040b72f3db9239ad40959793b1799c25ac04b
- Trigger Event: push

File details

Details for the file ddqn_router-0.2.0-py3-none-any.whl.

File metadata

Download URL: ddqn_router-0.2.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 31.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ddqn_router-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f4764b11c5e34f6259f42bd8b47025d4eb9c64302210615658899cd4887a18e`
MD5	`f0e1f75b63a5a694031906d091266c31`
BLAKE2b-256	`1f621f6ef66697961e9a142e259c7d1a168736ee10a90b56bd81de6453857280`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ddqn_router-0.2.0-py3-none-any.whl:

Publisher: publish.yml on kirmoz1997/ddqn-router

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ddqn_router-0.2.0-py3-none-any.whl
- Subject digest: 8f4764b11c5e34f6259f42bd8b47025d4eb9c64302210615658899cd4887a18e
- Sigstore transparency entry: 1247215339
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: kirmoz1997/ddqn-router@b19040b72f3db9239ad40959793b1799c25ac04b
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/kirmoz1997
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b19040b72f3db9239ad40959793b1799c25ac04b
- Trigger Event: push

ddqn-router 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ddqn-router

Why DDQN Routing?

Cost

Speed

Quality

Why Double DQN specifically?

When To Use ddqn-router

Good fit

Not a good fit

Comparison summary

Quickstart

1. Install

2. Define your agents

3. Label your data

4. Split and train

5. Use the router

CLI Reference

ddqn-router label

ddqn-router dataset stats

ddqn-router dataset split

ddqn-router train

ddqn-router serve

Python API

DDQNRouter.load(artifacts_path) -> DDQNRouter

router.route(query) -> RouteResult

router.route_batch(queries) -> list[RouteResult]

router.explain(query) -> None

router.agents -> list[dict]

Configuration Reference

Agents

Labeler

Dataset

Training

Output

How It Works

Project Structure

Research Background

Publishing

Release process (maintainer)

Local build (for testing before release)

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`ddqn-router label`

`ddqn-router dataset stats`

`ddqn-router dataset split`

`ddqn-router train`

`ddqn-router serve`

`DDQNRouter.load(artifacts_path) -> DDQNRouter`

`router.route(query) -> RouteResult`

`router.route_batch(queries) -> list[RouteResult]`

`router.explain(query) -> None`

`router.agents -> list[dict]`