Cache-aware prompt structure optimizer + local cost/usage logger for LLM apps.

These details have not been verified by PyPI

Project links

Homepage

Project description

ContextOps

Cache-aware prompt structure optimizer + LLM-as-judge eval + local cost/usage logger.

Stop paying for the same tokens twice. ContextOps reorders your prompt sections so stable content (system prompt, tools) sits at the top — and variable content (query, history) sits at the bottom — maximizing cache hit rate on Anthropic / OpenAI / DeepSeek / any provider that does prefix caching.

No cloud, no SaaS, no SDK lock-in. Just pip install contextops-tool and go.

⚡ Quickstart

pip install contextops-tool

from contextops import optimize, Prompt

p = Prompt(
    query="What's the weather in Berlin?",
    history=[{"role": "user", "content": "Hi!"}],
    documents="Berlin weather API docs...",
    system="You are a helpful weather assistant.",
    tools='[{"name": "get_weather"}]',
    model="gpt-4o",
)

result = optimize(p)
print(result.diff())                            # history → documents → ... → query
print(f"Cache hit: {result.estimated_cache_hit_rate:.1%}")
print(f"Saves ~${result.estimated_cost_savings_usd:.4f} per 1k calls")

Output:

Section order: history → documents → ... → system → ... → query
Cache hit: 71.0%
Saves ~$0.1006 per 1k calls

That's it. Same prompt, same tokens, ~70% cache hit rate instead of ~5%.

🤔 Why?

LLM providers (Anthropic, OpenAI, DeepSeek, Google) cache the prefix of your prompt. If the prefix is stable across calls, you pay 10% of the cached-token price instead of the full price.

The trick: keep the prefix stable by putting variable content (query, history) at the end.

ContextOps knows the canonical ordering by stability:

system → tools → role → context → documents → history → query
  ↑ stable                                                ↑ variable

Estimated impact on a typical workload:

Setup	Cache hit rate	Effective $/1M input
Random order	~5%	$X (full price)
ContextOps optimized	~78%	~$0.3·X (10% on cached prefix)

🧰 What's in the box

Feature	Description
Cache-aware reordering	Moves stable sections to the top, variable to the bottom. Same total tokens, much higher cache hit rate.
Token counting	tiktoken-based, model-aware (`gpt-4o`, `claude-`, `qwen`, fallback to `cl100k_base`).
Cost estimation	Per-model pricing baked in; estimates $/1k calls before vs after reorder.
LLM-as-judge eval	Built-in metrics: `faithfulness`, `relevance`, `completeness`, `conciseness`.
A/B testing	Run two prompts over a golden dataset, get structural + quality deltas.
Local SQLite logger	Every LLM call goes to `~/.contextops/calls.db`. Zero cloud.
Dataset loaders	`.json`, `.jsonl`, `.csv` golden QA datasets.
Rich CLI	`optimize / stats / recent / compare / eval / reset` with tables and progress bars.
LiteLLM auto-log (opt)	One line to auto-log every litellm call. `pip install "contextops[integrations]"`
Bench harness	1000+ prompts through Ollama, LM Studio, or OpenRouter.

📦 Install

# Core (optimizer + logger + eval + CLI)
pip install contextops-tool

# With LiteLLM auto-callback for real LLM logging
pip install "contextops-tool[integrations]"

# With dev tooling (pytest, ruff, mypy)
pip install "contextops-tool[dev]"

# Everything
pip install "contextops-tool[all]"

From source:

git clone https://github.com/QuickLeopard/contextops.git
cd contextops
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,integrations]"
pytest                                          # 35 tests
python -m contextops_bench smoke               # offline smoke

Requires Python 3.10+.

macOS / Linux gotcha: `python` not found

If you see zsh: command not found: python, use python3 instead. Or install Python 3.12 via Homebrew and add it to PATH:

brew install python@3.12
export PATH="/opt/homebrew/opt/python@3.12/bin:$PATH"
python --version   # should print 3.12.x

If you want a fully automated setup, run the bootstrap script after cloning:

git clone https://github.com/QuickLeopard/contextops.git
cd contextops
./scripts/bootstrap.sh   # installs Python, creates venv, runs tests + smoke

No `python -m venv`?

Some Linux distros ship Python without venv:

# Debian / Ubuntu
sudo apt install python3.12-venv

# Fedora
sudo dnf install python3.12-venv

# Or use virtualenv instead
pip install virtualenv
virtualenv .venv

📖 Usage

1. Optimize a prompt (Python)

from contextops import optimize, Prompt

p = Prompt(
    query="What's the weather in Berlin?",
    documents="API docs...",
    system="You are a helpful weather assistant.",
    tools="[]",
    model="gpt-4o",
)

result = optimize(p)
print(result.diff())                  # before → after
print(result.optimized_sections)      # [(Section, content), ...]

2. Compare two prompts (Python)

from contextops.eval import compare

report = compare(baseline=bad_prompt, optimized=good_prompt)
print(report["delta"])
# {"tokens": 0, "cache_hit_rate": 0.65, "cost_savings_per_1k_usd": 4.21}

3. A/B eval with LLM-as-judge

from contextops import evaluate_ab, load_dataset, Prompt, LiteLLMJudge

dataset = load_dataset("evals/sample_dataset.jsonl")

baseline = Prompt(system="...", query="", documents="{ctx}", model="gpt-4o-mini")
optimized = Prompt(system="...", documents="{ctx}", query="", model="gpt-4o-mini")

def my_llm(prompt_str: str) -> str:
    return call_my_llm(prompt_str)

report = evaluate_ab(
    baseline, optimized,
    run_fn=my_llm,
    dataset=dataset,
    metrics=["faithfulness", "relevance", "completeness"],
    judge=LiteLLMJudge(),
    on_render=lambda p, item: p.system.replace("{ctx}", item.context),
)

print(report["structural"])   # tokens / cache / cost deltas
print(report["quality"])      # per-metric judge deltas

4. CLI

# Optimize a prompt inline
contextops optimize \
    --system "You are a weather assistant." \
    --query "What's the weather in Berlin?" \
    --documents "API docs..." \
    --model gpt-4o

# Load a prompt from a JSON file
contextops optimize --from-json my_prompt.json

# Side-by-side comparison
contextops compare baseline.json optimized.json

# A/B eval with offline echo judge
contextops eval \
    --baseline evals/baseline_prompt.json \
    --optimized evals/optimized_prompt.json \
    --dataset evals/sample_dataset.jsonl \
    --metrics relevance,completeness,faithfulness \
    --echo --run-fn echo \
    --output report.json

# Real LLM-as-judge
pip install "contextops[integrations]"
contextops eval \
    --baseline evals/baseline_prompt.json \
    --dataset evals/sample_dataset.jsonl \
    --judge-model gpt-4o-mini \
    --metrics relevance,completeness,faithfulness,conciseness

# Local call stats
contextops stats
contextops recent --limit 50

# Reset the local database
contextops reset

5. Auto-log every LiteLLM call

from contextops.integrations import install_callback
install_callback()

import litellm
litellm.completion(model="gpt-4o", messages=[{"role": "user", "content": "hi"}])
# → automatically logged to ~/.contextops/calls.db

🆚 Comparison

Tool	What it does	Where ContextOps is different
DSPy	Auto-rewrites prompt text using a dataset	We reorder sections — no dataset, no model rewrite
RAGAS / DeepEval	Evaluate answer quality via LLM-judge	We measure structure + cost, complementary not competing
Langfuse	Cloud LLM observability	We stay local-first: SQLite, no signup
prompt-cache / token-optimizer	Cache responses, compress tokens	We focus on provider cache (Anthropic / OpenAI), not response cache
vaibkumr/prompt-optimizer	Compress text (LLMLingua-style)	We reorder, never change tokens or text

🧪 Bench harness

1000+ prompts through Ollama, LM Studio, or OpenRouter:

# Smoke (10 prompts, <30s, no LLM, for CI)
python -m contextops_bench smoke

# Local (100 prompts via Ollama)
python -m contextops_bench local --provider ollama --model llama3.1:8b --n 100

# Cloud (1000 prompts via OpenRouter, 3 models, parallel)
export OPENROUTER_API_KEY=sk-or-v1-...
python -m contextops_bench cloud --provider openrouter \
    --model openai/gpt-4o-mini,anthropic/claude-3.5-haiku,meta-llama/llama-3.1-8b-instruct \
    --n 1000 --parallel 4

Each run writes:

bench/results/<label>.csv — every observation (prompt_id, model, tokens, cache hit, cost, latency, error, section order)
bench/results/<label>.summary.json — aggregated stats with optimized vs baseline deltas

See docs/ACCEPTANCE.md for the formal pass criteria.

🗺️ Roadmap

✅ v0.1 — reorder, token count, SQLite logger, CLI
✅ v0.2 — LLM-as-judge eval + A/B testing + dataset loaders
✅ v0.2+bench — bench harness for Ollama / LM Studio / OpenRouter + acceptance criteria
🔜 v0.3 — RAG curator (multi-signal retrieval + strict threshold)
🔜 v1.0 — Access-aware context + audit trail (on-prem / enterprise)

📚 Documentation

docs/ACCEPTANCE.md — formal pass/fail criteria
CHANGELOG.md — version history
CONTRIBUTING.md — how to contribute
SECURITY.md — how to report vulnerabilities
evals/ — sample datasets and prompts

🤝 Contributing

PRs welcome. See CONTRIBUTING.md for workflow, conventions, and release process.

Good first contributions:

New metric in contextops/judge.py (e.g. safety, format_compliance)
New provider in contextops_bench/clients.py (e.g. vllm, tgi)
Better pricing tables for non-USD regions
Translations of docs/ and README.md

📜 License

MIT.

✨ Credits

Built with:

pydantic for prompt modeling
tiktoken for token counting
click + rich for the CLI
litellm for the optional auto-callback

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextops_tool-0.3.0.tar.gz (50.6 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contextops_tool-0.3.0-py3-none-any.whl (39.3 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file contextops_tool-0.3.0.tar.gz.

File metadata

Download URL: contextops_tool-0.3.0.tar.gz
Upload date: Jul 3, 2026
Size: 50.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextops_tool-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`3929d81855eb7748aaf2580768d08d06914e4d86d6a4fff281dfab02a6466434`
MD5	`a89c962e35fc4ed5c6f6e3731ffd6bbc`
BLAKE2b-256	`0304ff7cfa9b161239ca5508fd01ae8591b2a6420636dd27c8285bd1127e7d66`

See more details on using hashes here.

File details

Details for the file contextops_tool-0.3.0-py3-none-any.whl.

File metadata

Download URL: contextops_tool-0.3.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 39.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextops_tool-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1ac2231dd191a7bbc0ef895ae1093b58859780ae516d810b3a3d8acf78eeea33`
MD5	`28d22cd3edf733fcab9274aa536fc26d`
BLAKE2b-256	`fbecb1a1c264da0743b3b67f2933a07cf240e0fdb7f628261bad04376a838361`

See more details on using hashes here.

contextops-tool 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextOps

⚡ Quickstart

🤔 Why?

🧰 What's in the box

📦 Install

macOS / Linux gotcha: python not found

No python -m venv?

📖 Usage

1. Optimize a prompt (Python)

2. Compare two prompts (Python)

3. A/B eval with LLM-as-judge

4. CLI

5. Auto-log every LiteLLM call

🆚 Comparison

🧪 Bench harness

🗺️ Roadmap

📚 Documentation

🤝 Contributing

📜 License

✨ Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

macOS / Linux gotcha: `python` not found

No `python -m venv`?