MCP server that observes every prompt, scores quality in real time, and closes the loop with iterative refinement. Built on FastMCP, SQLite, and Bayesian priors — no extra LLM required.

These details have not been verified by PyPI

Project links

Project description

PromptLoop

A Bayesian MCP sidecar for your IDE agent — observe prompts, refine outputs, and stop agent loops using externally verified scores.

The underlying CLI and tool API ship as loopllm (the original name). PromptLoop is the project brand. Current release: v0.7.0.

What is PromptLoop?

PromptLoop sits between you and Cursor, VS Code Copilot, or any MCP client. It does not replace your agent harness — it adds three capabilities on top:

Prompt observer — score every prompt across 5 dimensions, route to elicitation or refinement, learn your preferences via online SGD.
Refinement loop — generate → evaluate → retry inside a single MCP tool call using sampling (loopllm_run_pipeline, loopllm_refine).
Conservative Dual-Verify agent loops — agents submit step artifacts; the server scores them through two independent channels and learns when to stop (loopllm_loop_start / loop_step / loop_end).

All three layers share one Bayesian learning core (AdaptivePriors + SQLite) — no training data, no PyTorch.

System at a glance

flowchart TB
  subgraph interfaces [Interfaces]
    MCP[MCP 28 tools]
    Ext[VS Code extension]
  end
  subgraph layers [Three layers]
    L1[Layer 1: Prompt observer intercept + SGD]
    L2[Layer 2: Refinement loop LoopedLLM]
    L3[Layer 3: CDV agent loops loop_start step end]
  end
  subgraph learn [Learning]
    Priors[AdaptivePriors SQLite v4]
  end
  interfaces --> layers
  layers --> Priors

Layer	Entry point	What it does
1 — Observer	`loopllm_intercept`	Score, route, log; Thompson Sampling for questions
2 — Refinement	`loopllm_run_pipeline`	Elicit → decompose → execute → verify via MCP sampling
3 — CDV loops	`loopllm_loop_step(step_output=...)`	External dual-verify scoring → guards → Bayesian stop

Quickstart

git clone https://github.com/azank1/loop-llm   # GitHub repo still named loop-llm
cd loop-llm
pip install -e ".[mcp]"
code .    # or open in Cursor

.vscode/mcp.json and .cursor/mcp.json are committed — the MCP server is picked up automatically. Verify on first load:

use loopllm_intercept with prompt: add retry logic to the download function

For iterative tasks, start a CDV loop:

use loopllm_loop_start with goal="make the failing test pass" task_type="bugfix"

VS Code Extension

Install the companion extension for a live quality scratchpad and prompt history dashboard directly in the sidebar.

Prompt Lab — live quality scratchpad

Prompt Lab

Scores on every keystroke (350 ms debounce). Grade badge, 5 dimension bars, issues + suggestions tags, Copy and Send to Chat.

History — learning curve + metrics

History

Learning curve sparkline, grade distribution, SGD learned weights per dimension. Updates after every loopllm_feedback call.

Build and install the extension from source:

cd vscode-loopllm
npm install
npx @vscode/vsce package          # produces loopllm-prompt-gauge-0.1.0.vsix
code --install-extension loopllm-prompt-gauge-0.1.0.vsix

Conservative Dual-Verify (CDV) — Layer 3

Most agent loops stop on a fixed max_iterations or let the agent self-grade when it's "done." Both waste tokens or optimize reported progress. v0.7 introduces Conservative Dual-Verify: agents submit step artifacts (test logs, diffs, summaries); the MCP server scores them through two independent channels and feeds the stricter score into Bayesian stop/continue logic.

channel_a = deterministic_evaluator.evaluate(step_output)   # regex, JSON, completeness
channel_b = critic_sample(step_output, goal, criteria)      # separate verifier call
final_score = min(channel_a, channel_b)                   # either channel can veto

loopllm_loop_start(
  goal="refactor module and make tests pass",
  task_type="bugfix",
  required_patterns=["tests passed"],
)
  → { suggested_budget: 3, quality_threshold: 0.8, evaluator_type: "composite" }

loopllm_loop_step(session_id, step_output="pytest: 3 failed, 12 passed")
  → {
      decision: "continue",
      score: 0.0,
      channel_a_score: 0.0,
      channel_b_score: 0.55,
      score_source: "conservative_dual_verify",
      deficiencies: ["Required pattern not found: tests passed"],
    }

loopllm_loop_end(session_id)   → learns optimal depth from verified trajectories

loopllm_loop_step returns stop when any guard fires: goal reached (verified score), plateau, low Bayesian ROI, budget exhausted, timeout, token cap, or repeated output.

See examples/agent_loop.py for the library demo and docs/demo/agent_loop_demo.md for CDV via MCP.

from loopllm import AdaptivePriors, AgentLoopController

controller = AgentLoopController(AdaptivePriors())
session = controller.start("fix flaky test", task_type="bugfix")
verdict = controller.step(session.session_id, score=0.9)   # library API (pre-scored)
controller.end(session.session_id)

Adaptive agent loop demo

CDV in Cursor — channel_a_score / channel_b_score in tool output

What a terminal run looks like (python examples/agent_loop.py):

=== Loop (task_type=bugfix) ===
Suggested budget: 3 step(s) | threshold 0.80 | confidence 0.00 (from 0 past loops)
  step  1 | 0.45 |#########           | -> CONTINUE: step 1/3, score 0.450 below 0.80
  step  2 | 0.85 |#################   | -> STOP: Goal reached: 0.850 >= 0.80 at step 2

Benchmark: adaptive vs fixed `max_iterations`

Reproducible simulation (benchmarks/adaptive_vs_fixed.py, seed=7, 300 test tasks, threshold 0.80):

Strategy	Mean steps	Mean final score	% reaching 0.80	Wasted steps	Efficiency (reach/step)
fixed (budget=2)	2.00	0.698	34.3%	0.00	17.2
fixed (budget=6)	6.00	0.939	94.0%	2.50	15.7
threshold (reactive)	3.56	0.852	100.0%	0.00	28.1
adaptive (loopllm)	3.56	0.852	99.7%	0.00	28.0

Adaptive uses ~41% fewer steps than a fixed 6-step budget while reaching the bar on 99.7% of tasks. Repro: python benchmarks/adaptive_vs_fixed.py.

Honest caveat: simulation with stated assumptions; measures decision efficiency given a quality signal, not absolute model quality.

Prompt pipeline — Layer 2

loopllm_run_pipeline is the main entry point for observe → elicit → refine → verify:

loopllm_run_pipeline("add retry logic to the download function")

What happens inside:

Score the prompt across 5 dimensions (< 1 ms, deterministic)
Elicit clarifying questions if quality < 0.6 — Thompson Sampling on Beta priors
Decompose into subtasks if complexity > 0.5
Execute each subtask: ctx.sample(prompt) → evaluate → retry if below threshold
Verify the assembled output via a second ctx.sample() call
Log result to SQLite; update scoring weights via online SGD

Everything runs inline via MCP Sampling — no extra chat turns, no polling.

How it works — Layer 1

You type a prompt
      ↓
loopllm_intercept          ← scores across 5 dimensions (~0ms, deterministic)
      ↓
route decision
  < 0.4  → elicitation     ← Thompson Sampling picks the highest-gain question
  0.4–0.6 → elicit + refine
  ≥ 0.6  → refine directly
      ↓
loopllm_refine (if needed)
  → ctx.sample(prompt)     ← MCP Sampling: calls host LLM mid-execution
  → evaluate output        ← deterministic evaluators (length, regex, JSON schema)
  → if score < threshold: ctx.sample(improved prompt)
      ↓
result logged to ~/.loopllm/store.db
      ↓
loopllm_feedback (optional rating 1–5) → SGD updates dimension weights

Scoring dimensions (each 0–1, composited by learned weights into grade A–F):

Dimension	What it catches
Specificity	Vague, generic requests
Constraint Clarity	Missing format, length, or rule requirements
Context Completeness	No background or goal stated
Ambiguity	Unclear references, pronouns without antecedents
Format Specification	No output format specified

MCP Sampling — generation tools call ctx.sample() to invoke the host agent's LLM inline. Falls back to agent_execute passthrough if the client doesn't declare sampling.

Learning math (SGD, Thompson Sampling, Bayesian priors)

Online Gradient Descent on scoring weights

Default weights: {specificity: 0.25, constraint_clarity: 0.20, context_completeness: 0.20, ambiguity: 0.20, format_spec: 0.15}. Each loopllm_feedback(rating) runs one SGD step; weights clip to $[0.05, 0.50]$ and renormalise. Persisted in learned_weights (schema v4).

Thompson Sampling for question ordering

Each question type maintains $\text{Beta}(\alpha, \beta)$; the pipeline draws $s_i \sim \text{Beta}(\alpha_i, \beta_i)$ and picks $\arg\max_i s_i$.

Beta-Binomial Bayesian priors

Per-(task_type, model) convergence priors drive adaptive exit in adaptive_exit.py via BetaPrior.prob_above(threshold).

Welford online variance

NormalPrior tracks running mean/variance with optional exponential decay ($\lambda = 0.95$).

Install as a package

pip install loopllm[mcp]

Add .vscode/mcp.json to your project:

{
  "servers": {
    "loopllm": {
      "type": "stdio",
      "command": "loopllm",
      "args": ["mcp-server", "--provider", "agent"]
    }
  }
}

Cursor users: .cursor/mcp.json uses "mcpServers" as the top-level key.

Tools (28)

Tool	What it does
`loopllm_run_pipeline`	Layer 2. Elicit → decompose → execute → verify in one call
`loopllm_intercept`	Layer 1. Score + route a prompt; logs to history
`loopllm_gauge`	Instant quality bars, no DB write
`loopllm_refine`	Score → sample → retry loop via MCP Sampling
`loopllm_plan_tasks`	Decompose a goal into ordered subtasks via MCP Sampling
`loopllm_verify_output`	Keyword pre-check + deep sample against quality criteria
`loopllm_elicitation_start/answer/finish`	Multi-turn clarifying question session
`loopllm_plan_register`	Create a confidence-gated plan saved to SQLite
`loopllm_plan_next`	Advance to next task; returns `needs_replan` if quality dropped
`loopllm_plan_update`	Record task scores; recalculates rolling confidence
`loopllm_plan_list`	Dashboard: all plans with gauges and task counts
`loopllm_plan_delete`	Remove a completed or abandoned plan
`loopllm_context_history`	Browse prompt history with sparklines
`loopllm_context_clear`	Wipe prompt history (scoped or all)
`loopllm_prompt_stats`	Prompting quality trend and learning curve
`loopllm_feedback`	Rate a response (1–5); triggers SGD weight update
`loopllm_suggest_config`	Bayesian-optimal loop config for a task type
`loopllm_loop_start`	Layer 3. Begin CDV agent loop; returns learned budget + verifier recipe
`loopllm_loop_step`	Submit step artifact for CDV; returns continue/stop + channel scores
`loopllm_loop_end`	Close loop and learn optimal depth from verified trajectories
`loopllm_loop_status`	Inspect an active agent-loop session
`loopllm_classify_task`	Label a prompt's task type
`loopllm_analyze_prompt`	Generate clarifying questions ranked by Thompson-sampled gain
`loopllm_list_tasks`	List tasks from the persistent store
`loopllm_show_task`	Detail view for a single task
`loopllm_report`	Learned weights, Bayesian priors, question effectiveness stats

Plans and learned weights persist to ~/.loopllm/store.db (schema v4).

Local models (no MCP)

pip install loopllm[serve]
loopllm serve --port 8765          # REST scoring middleware

from loopllm.local_loop import LocalModelLoop

loop = LocalModelLoop(
    base_url="http://localhost:11434",
    model="llama3.2",
    score_url="http://localhost:8765/score",
    quality_threshold=0.80,
    max_retries=3,
)
result = loop.run("Write a Python function to parse JSON safely.")
print(result.output, result.best_score, result.converged)

Contributing

git clone https://github.com/azank1/loop-llm
cd loop-llm
pip install -e ".[dev]"
python -m pytest tests/ -q          # 219 tests (215 pass, 4 skipped), ~2s

See CONTRIBUTING.md for branch naming (az/<type>/<short>) and checks.

Key files:

src/loopllm/mcp_server.py — 28 MCP tools + MCP Sampling helpers
src/loopllm/step_scorer.py — Conservative Dual-Verify scoring
src/loopllm/guards.py — composable agent-loop stop stack
src/loopllm/agent_loop.py — adaptive agent-loop controller
src/loopllm/evaluator_factory.py — build evaluators for CDV Channel A
src/loopllm/priors.py — Beta/Normal priors, Welford, Thompson Sampling
src/loopllm/store.py — SQLite persistence (schema v4)
src/loopllm/engine.py — core refinement loop (LoopedLLM)

PRs welcome. Add tests for new tools in tests/.

Environment variables

Variable	Default	Description
`LOOPLLM_PROVIDER`	`agent`	`agent`, `ollama`, or `openrouter`
`LOOPLLM_MODEL`	`agent`	Model identifier (ignored in agent mode)
`LOOPLLM_DB`	`~/.loopllm/store.db`	SQLite store path
`OLLAMA_HOST`	`http://localhost:11434`	Ollama base URL
`OPENROUTER_API_KEY`	—	OpenRouter API key

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.7.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loopllm-0.7.0.tar.gz (497.7 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

loopllm-0.7.0-py3-none-any.whl (95.5 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file loopllm-0.7.0.tar.gz.

File metadata

Download URL: loopllm-0.7.0.tar.gz
Upload date: Jun 21, 2026
Size: 497.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for loopllm-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`f3e104711c9d55d69c349d9dac8ef14ed7b4889be068f0fe1de30305ad7e6397`
MD5	`028af553ec7bccb396e5b1f748719cd0`
BLAKE2b-256	`b5e49a91e0a2c1d88f1c17c35b92cffc282fee2e7117073f73fa3d08f271ea84`

See more details on using hashes here.

File details

Details for the file loopllm-0.7.0-py3-none-any.whl.

File metadata

Download URL: loopllm-0.7.0-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 95.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for loopllm-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f0fd511c96715a43f65e6b225774dd5fc9295118d4663c4205c1e92be5808748`
MD5	`ed8873b93bca9dd812ae589f72646fb9`
BLAKE2b-256	`bcea3e235790b8cf6ebe9343ee17678c302d71e97e113f56bcb1f59d73b97b7e`

See more details on using hashes here.

loopllm 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PromptLoop

What is PromptLoop?

System at a glance

Quickstart

VS Code Extension

Conservative Dual-Verify (CDV) — Layer 3

Benchmark: adaptive vs fixed max_iterations

Prompt pipeline — Layer 2

How it works — Layer 1

Online Gradient Descent on scoring weights

Thompson Sampling for question ordering

Beta-Binomial Bayesian priors

Welford online variance

Install as a package

Tools (28)

Local models (no MCP)

Contributing

Environment variables

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Benchmark: adaptive vs fixed `max_iterations`