Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens.
Project description
simplicio-cli
Turn a one-line task into a verified code change — with any LLM.
"hide the Delete button for non-admins" → diff + test + applied + verified. Works with OpenRouter, OpenAI, Anthropic, GLM, DeepSeek, Ollama — one env var.
pip install simplicio-cli
Why it works — the numbers
Same model. Same task. Only the prompt changes. Measured, reproducible, deterministic.
| Model | Without simplicio | With simplicio | Gain |
|---|---|---|---|
| Gemma 3 12B IT | 34% | 92% | +58 pts |
| Llama 3.1 8B Instruct | 36% | 90% | +54 pts |
| Qwen 2.5 7B Instruct | 34% | 88% | +54 pts |
| Average across 3 models · 10 cases · 156 checks | 35% | 90% | +55 pts (+156%) |
Output-quality signals (rate across all 30 runs)
| Signal | Raw prompt | With simplicio |
|---|---|---|
| DIFF block present | 0% | 100% |
| Target file mentioned | 0% | 96% |
| TEST block present | 80% | 96% |
Cost — tokens & wall-clock (measured, not estimated)
Same provider, same models, same cases. Token counts pulled from the API
usage field; latency from time.perf_counter() around each call.
| Side | Tokens / run | Wall-clock / run | Total tokens (30 runs) | Total time |
|---|---|---|---|---|
| Raw prompt | 759 | 12.4s | 22,774 | 6m 13s |
| With simplicio | 770 | 9.9s | 23,127 | 4m 58s |
| Δ | +1% | −21% | +353 | −75s |
simplicio wraps the objective in a 6-layer contract — more input tokens up front, fewer completion tokens because the model stops guessing. Net effect across 30 runs: roughly the same token bill, 21% faster, 90% pass-rate.
A 7B-parameter open model wrapped in simplicio's 6-layer contract outperforms the same model with raw prompting by 42 to 64 points. Without changing the model. Without fine-tuning. Without extra tokens at runtime worth mentioning.
Full report: bench/results.md · bench/results.pdf · raw outputs under .simplicio/bench_runs/.
How it works
mapper WHERE project structure + latest state
precedent HOW-1 the real snippet in THIS repo that already does it
skill-router HOW-2 the ONE mapper skill that matches (ranked, not all)
simplicio BUILD stacks the 6 layers into one prompt (cache-friendly)
test JUDGE contract written as testable states
verify PROOF ran it — did it actually pass? loop-fix up to 3x
The idea in one line: don't ask the model to guess — hand it the path. Each layer terminates one decision the model would otherwise hallucinate. Relevant > complete — inject the right context, never all of it.
Install
pip install simplicio-cli # from PyPI
# or
pip install -e . # from this repo
Configure — any LLM, nothing hardcoded
| Provider | SIMPLICIO_MODEL | SIMPLICIO_BASE_URL |
|---|---|---|
| OpenRouter | anthropic/claude-opus-4 |
https://openrouter.ai/api/v1 |
| GLM (z.ai) | glm-4.6 |
https://api.z.ai/api/paas/v4 |
| DeepSeek | deepseek-chat |
https://api.deepseek.com |
| OpenAI | gpt-4.1 |
https://api.openai.com/v1 |
| Local (Ollama) | llama3 |
http://localhost:11434/v1 |
| Anthropic native | claude-opus-4-7 |
(leave unset) |
If SIMPLICIO_BASE_URL is unset and the key is ANTHROPIC_API_KEY, it uses the
native Anthropic SDK. Otherwise it uses an OpenAI-compatible client pointed at
your base_url — so any OpenAI-like provider works without code changes.
simplicio smoke # prints provider config + one test call
Use
# index once (caches embeddings; re-run after big changes)
simplicio index --stack angular
# run a task
simplicio task "hide Delete button for non-admins" \
--stack angular \
--alvo src/app/screen/screen.component.html \
--criterios "- no admin perm: button absent from DOM
- with admin perm: button present" \
--restricoes "- don't touch save flow
- build passes"
Each task: precedent (from cache) → skill match → 6 layers → LLM generates
(diff + test + Playwright) → apply → run SIMPLICIO_TEST_CMD → pass? done :
send the error back → fix → retry (up to 3x).
Cache — why it doesn't re-map every time
Embeddings are keyed by content hash, stored in .simplicio/. Unchanged
code block → vector reused. Change one file → only that block re-embeds.
| Run | Blocks embedded | Time |
|---|---|---|
| 1st (cold cache) | 3 | ~baseline |
| 2nd (no change) | 0 | ~instant |
| after editing 1 file | 1 | partial |
Benchmark — reproduce in 30 seconds
OPENROUTER_API_KEY=… \
BENCH_MODELS="qwen/qwen-2.5-7b-instruct,meta-llama/llama-3.1-8b-instruct,google/gemma-3-12b-it" \
python3 bench/run_offline.py
No project required, stdlib only, deterministic regex scoring — no LLM judges
the LLM. Each case runs twice on the same model: raw one-line objective vs
simplicio's 6-layer contract. Outputs scored on target-file mention, DIFF
block, TEST block, contract-state words. Full numbers in bench/results.md.
Full harness (your real project, your real tests)
simplicio bench --cases bench/cases.json --stack angular
Runs each case two ways and runs your real test command (e.g. ng test --watch=false) on each output. Writes the true pass-rate to
bench/results.md.
Plug points (stubs marked in code)
| File | Replace with |
|---|---|
prompt.py::_mapper |
your real llm-project-mapper |
pipeline.py::_aplicar_e_testar |
extract diff → git apply → parse test result |
skill_router.py |
point SIMPLICIO_SKILLS_DIR at your mapper's skills |
Layout
simplicio/
cli.py # index | task | bench | smoke
cache.py # content-hash embedding cache
precedent.py # grep + semantic rank (uses cache)
skill_router.py # picks the ONE matching skill
prompt.py # stacks the 6 layers
providers.py # any OpenAI-compatible endpoint + Anthropic native
pipeline.py # generate → test → fix loop
bench.py # with-vs-without harness
templates/simplicio_prompt.md
bench/
run_offline.py # stdlib-only multi-model benchmark
cases.json # your benchmark tasks
cases_offline.json
results.md # filled by `simplicio bench` / `run_offline.py`
charts/ # SVG: overall, delta, by_case, by_stack
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simplicio_cli-0.2.2.tar.gz.
File metadata
- Download URL: simplicio_cli-0.2.2.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b082df54a6aba9cc61a1321a50cae7126701a5addc8fcfea3d8a34571fc34c23
|
|
| MD5 |
eb469c17b513e6c13d71ecb469b74d85
|
|
| BLAKE2b-256 |
1776196bd62bc8ae9d6caecaeaf30c056fdfff719720a4fd66808d960a3b7a10
|
File details
Details for the file simplicio_cli-0.2.2-py3-none-any.whl.
File metadata
- Download URL: simplicio_cli-0.2.2-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
934562ebf4f6d542971c1e62ab736d78466bdec942373a6c50553926b5a79b4f
|
|
| MD5 |
e50534e736fa117a237fae85c74cd76f
|
|
| BLAKE2b-256 |
33835d3fe4526e58cfc84d0616f8a702e30831397564a726e9c3e54f6362476d
|