Skip to main content

Portable task-to-code pipeline that works with any LLM. Turn a one-line task into a verified code change — diff + test + verify loop. +55 pts on a 156-check benchmark, 21% faster, ~same tokens.

Project description

simplicio-cli

Turn a one-line task into a verified code change — with any LLM.

PyPI Python License: MIT

simplicio-cli pipeline hero: one-line task to verified code change

"hide the Delete button for non-admins" → diff + test + applied + verified. Works with OpenRouter, OpenAI, Anthropic, GLM, DeepSeek, Ollama — one env var.

pip install simplicio-cli

Why it works — the numbers

Same model. Same task. Only the prompt changes. Measured, reproducible, deterministic.

Model Without simplicio With simplicio Gain
Gemma 3 12B IT 34% 92% +58 pts
Llama 3.1 8B Instruct 36% 90% +54 pts
Qwen 2.5 7B Instruct 34% 88% +54 pts
Average across 3 models · 10 cases · 156 checks 35% 90% +55 pts (+156%)

Output-quality signals (rate across all 30 runs)

Signal Raw prompt With simplicio
DIFF block present 0% 100%
Target file mentioned 0% 96%
TEST block present 80% 96%

Cost — tokens & wall-clock (measured, not estimated)

Same provider, same models, same cases. Token counts pulled from the API usage field; latency from time.perf_counter() around each call.

Side Tokens / run Wall-clock / run Total tokens (30 runs) Total time
Raw prompt 759 12.4s 22,774 6m 13s
With simplicio 770 9.9s 23,127 4m 58s
Δ +1% −21% +353 −75s

simplicio wraps the objective in a 6-layer contract — more input tokens up front, fewer completion tokens because the model stops guessing. Net effect across 30 runs: roughly the same token bill, 21% faster, 90% pass-rate.

A 7B-parameter open model wrapped in simplicio's 6-layer contract outperforms the same model with raw prompting by 42 to 64 points. Without changing the model. Without fine-tuning. Without extra tokens at runtime worth mentioning.

Full report: bench/results.md · bench/results.pdf · raw outputs under .simplicio/bench_runs/.


How it works

mapper        WHERE   project structure + latest state
precedent     HOW-1   the real snippet in THIS repo that already does it
skill-router  HOW-2   the ONE mapper skill that matches (ranked, not all)
simplicio     BUILD   stacks the 6 layers into one prompt (cache-friendly)
test          JUDGE   contract written as testable states
verify        PROOF   ran it — did it actually pass? loop-fix up to 3x

The idea in one line: don't ask the model to guess — hand it the path. Each layer terminates one decision the model would otherwise hallucinate. Relevant > complete — inject the right context, never all of it.


Install

pip install simplicio-cli           # from PyPI
# or
pip install -e .                    # from this repo

Configure — any LLM, nothing hardcoded

Provider SIMPLICIO_MODEL SIMPLICIO_BASE_URL
OpenRouter anthropic/claude-opus-4 https://openrouter.ai/api/v1
GLM (z.ai) glm-4.6 https://api.z.ai/api/paas/v4
DeepSeek deepseek-chat https://api.deepseek.com
OpenAI gpt-4.1 https://api.openai.com/v1
Local (Ollama) llama3 http://localhost:11434/v1
Anthropic native claude-opus-4-7 (leave unset)

If SIMPLICIO_BASE_URL is unset and the key is ANTHROPIC_API_KEY, it uses the native Anthropic SDK. Otherwise it uses an OpenAI-compatible client pointed at your base_url — so any OpenAI-like provider works without code changes.

simplicio smoke      # prints provider config + one test call

Use

# index once (caches embeddings; re-run after big changes)
simplicio index --stack angular

# run a task
simplicio task "hide Delete button for non-admins" \
  --stack angular \
  --alvo src/app/screen/screen.component.html \
  --criterios "- no admin perm: button absent from DOM
- with admin perm: button present" \
  --restricoes "- don't touch save flow
- build passes"

Each task: precedent (from cache) → skill match → 6 layers → LLM generates (diff + test + Playwright) → apply → run SIMPLICIO_TEST_CMD → pass? done : send the error back → fix → retry (up to 3x).


Cache — why it doesn't re-map every time

Embeddings are keyed by content hash, stored in .simplicio/. Unchanged code block → vector reused. Change one file → only that block re-embeds.

Run Blocks embedded Time
1st (cold cache) 3 ~baseline
2nd (no change) 0 ~instant
after editing 1 file 1 partial

Benchmark — reproduce in 30 seconds

OPENROUTER_API_KEY= \
  BENCH_MODELS="qwen/qwen-2.5-7b-instruct,meta-llama/llama-3.1-8b-instruct,google/gemma-3-12b-it" \
  python3 bench/run_offline.py

No project required, stdlib only, deterministic regex scoring — no LLM judges the LLM. Each case runs twice on the same model: raw one-line objective vs simplicio's 6-layer contract. Outputs scored on target-file mention, DIFF block, TEST block, contract-state words. Full numbers in bench/results.md.

Full harness (your real project, your real tests)

simplicio bench --cases bench/cases.json --stack angular

Runs each case two ways and runs your real test command (e.g. ng test --watch=false) on each output. Writes the true pass-rate to bench/results.md.


Plug points (stubs marked in code)

File Replace with
prompt.py::_mapper your real llm-project-mapper
pipeline.py::_aplicar_e_testar extract diff → git apply → parse test result
skill_router.py point SIMPLICIO_SKILLS_DIR at your mapper's skills

Layout

simplicio/
  cli.py          # index | task | bench | smoke
  cache.py        # content-hash embedding cache
  precedent.py    # grep + semantic rank (uses cache)
  skill_router.py # picks the ONE matching skill
  prompt.py       # stacks the 6 layers
  providers.py    # any OpenAI-compatible endpoint + Anthropic native
  pipeline.py     # generate → test → fix loop
  bench.py        # with-vs-without harness
  templates/simplicio_prompt.md
bench/
  run_offline.py  # stdlib-only multi-model benchmark
  cases.json      # your benchmark tasks
  cases_offline.json
  results.md      # filled by `simplicio bench` / `run_offline.py`
  charts/         # SVG: overall, delta, by_case, by_stack

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplicio_cli-0.2.2.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simplicio_cli-0.2.2-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file simplicio_cli-0.2.2.tar.gz.

File metadata

  • Download URL: simplicio_cli-0.2.2.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for simplicio_cli-0.2.2.tar.gz
Algorithm Hash digest
SHA256 b082df54a6aba9cc61a1321a50cae7126701a5addc8fcfea3d8a34571fc34c23
MD5 eb469c17b513e6c13d71ecb469b74d85
BLAKE2b-256 1776196bd62bc8ae9d6caecaeaf30c056fdfff719720a4fd66808d960a3b7a10

See more details on using hashes here.

File details

Details for the file simplicio_cli-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: simplicio_cli-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for simplicio_cli-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 934562ebf4f6d542971c1e62ab736d78466bdec942373a6c50553926b5a79b4f
MD5 e50534e736fa117a237fae85c74cd76f
BLAKE2b-256 33835d3fe4526e58cfc84d0616f8a702e30831397564a726e9c3e54f6362476d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page