AI Infrastructure OS — same engine, four facades (CLI, REST, web, MCP). Reduce 80-95% of AI spend by routing tasks to the right model.
Project description
OmniAgent
You are overspending on AI.
OmniAgent routes every AI task to the most efficient model automatically. Local first. Cloud only when it pays off. 80–97% savings on your AI bill.
Try the AI Cost Calculator · See the 60-second demo · Star on GitHub
The math most devs don't realize
"I just need AI to review my code, write docstrings, and rename things." — every developer with a $500/month Cursor + Claude bill
Here's what you actually need (benchmarked on real hardware, Jun 2026):
| Task | All-Claude reality | OmniAgent | Savings |
|---|---|---|---|
| Review a function for bugs | Claude · $0.30 | qwen2.5-coder:7b (local) · $0.00 | 100% |
| Write a Google-style docstring | Claude · $0.28 | qwen2.5-coder:7b (local) · $0.00 | 100% |
| Rename a variable | Claude · $0.15 | qwen2.5-coder:7b (local) · $0.00 | 100% |
| Explain TCP vs UDP | Claude · $0.10 | qwen2.5-coder:7b (local) · $0.00 | 100% |
| Classify a bug ticket | Claude · $0.08 | qwen2.5-coder:7b (local) · $0.00 | 100% |
Fleet benchmark · MSI desktop (GTX 1650, 4GB VRAM, 8 threads) · 5 tasks · 506 tokens: $0.00 total cloud spend.
OmniAgent uses Claude when Claude is the right tool. It just doesn't use Claude when Qwen can do the same job at 1% the cost.
The 60-second demo
Real benchmark run on MSI desktop (GTX 1650, 4GB VRAM, 8 threads):
→ msi-node: qwen2.5-coder:7b | $0.00 | 30,123ms (review function)
→ msi-node: qwen2.5-coder:7b | $0.00 | 50,475ms (write docstring)
→ msi-node: qwen2.5-coder:7b | $0.00 | 10,181ms (rename variable)
→ msi-node: qwen2.5-coder:7b | $0.00 | 15,321ms (explain TCP/UDP)
→ msi-node: qwen2.5-coder:7b | $0.00 | 8,649ms (classify ticket)
Total: 5 tasks · 506 tokens · $0.00 cloud spend · avg 22.95s/task
Every task ran on local GPU. Zero cloud cost. That's what "AI Infrastructure OS" means.
Weekly AI Intelligence — omniagent post-mortem
"You don't need a budget. You need to see what you spent."
v0.2.0 adds the killer first-run experience: a persistent cost ledger + a Weekly AI Intelligence report.
Every omniagent agent-route and omniagent fleet route call is now logged to ~/.omniagent/postmortem/ledger.db. Then run:
omniagent post-mortem # last 7 days
omniagent post-mortem --period month # last 30 days
omniagent post-mortem --period all # all time
omniagent post-mortem --json | jq # pipe to tools
omniagent post-mortem -o weekly.md # save the report
omniagent post-mortem --demo # inject sample data and see the report
Sample output (with --demo):
# 🧠 Weekly AI Intelligence
_Generated 2026-06-03 · Period: last 7 days_
## 💰 Top-line numbers
| Metric | Value |
|---|---:|
| Tasks run | 10 |
| Tokens (in + out) | 23,770 |
| **Total cost** | **$0.3046** |
| ↳ Local | $0.0000 |
| ↳ Cloud | $0.3046 |
| All-Claude-Sonnet equivalent | $0.1777 |
| **Savings vs all-Claude** | **-71.4%** ⚠️ |
> 💡 You spent $0.3046 on cloud models.
> $0.2894 of that (≈95%) probably could have been local.
## ⚡ Top optimization opportunities
**Total potential savings: $0.0991**
### 1. claude-sonnet-4 → qwen2.5-coder:7b
- Calls: 6 · Tokens: 12,570
- Actual cost: $0.0936 · Could have been: $0.0000
- Savings: **$0.0936** · Risk: low
### 2. gpt-4o → qwen2.5-coder:7b
- Calls: 1 · Tokens: 1,000
- Actual cost: $0.0055 · Could have been: $0.0000
- Savings: **$0.0055** · Risk: low
### 3. claude-opus-4-5 → no alt
- Calls: 1 · Tokens: 5,300
- Actual cost: $0.2055 · Could have been: $0.2055
- Savings: $0.0000 · Risk: high (frontier reasoning)
The "risk" field is honest. Frontier models (Opus, o1) get risk=high and local_alternative=null — you really did need that model. Trivial and simple tasks get risk=low and a concrete local alternative. No misleading savings claims.
Same data is available from the web: GET /api/postmortem?period=week.
Why OmniAgent exists
The AI industry is in an efficiency crisis:
- 73% of prompts sent to frontier models could be handled by smaller local models
- Developers burn $500–$1000/month on Cursor + Claude + GPT with no visibility into what each line costs
- Agents hallucinate APIs, break production code, leak secrets, forget to commit — and you find out at 2 AM
- Massive energy waste: a single city could run on the daily inference cycles of one frontier API call
- Lock-in: one IDE, one provider, one pricing tier
- No coordination between local hardware, cloud APIs, VPS nodes, and the billions of idle GPUs sitting in garages and offices worldwide
The models will keep changing. The hardware will keep evolving. The only permanent problem is: how do you orchestrate all this intelligence efficiently, securely, and cheaply?
That's what OmniAgent solves.
What it is (and what it isn't)
OmniAgent is not a model. Not an agent. Not a chatbot.
OmniAgent is the operating system that coordinates the entire AI ecosystem — models, agents, hardware, costs, and security — so you stop wasting compute, money, and trust.
Think of it as:
- Linux doesn't create every app, but everything runs on it.
- Kubernetes doesn't build every container, but it orchestrates them all.
- Steam doesn't develop every game, but it hosts them.
OmniAgent doesn't compete with OpenAI, Anthropic, DeepSeek, or your favorite open-source model. It makes all of them work together intelligently.
The 4 façades: one engine, four ways to use it
| Façade | Audience | What you get |
|---|---|---|
CLI (omniagent route "task") |
Developers, power users | Full control, scriptable, fits in any pipeline |
Web app (omniagent web) |
Everyone, especially non-devs | 5-tab dashboard on http://localhost:8765 — visualize routing, hardware, optimize |
YAML agents (*.yaml in ~/.omniagent/agents/) |
Agent authors, teams | Declarative, shareable, version-controlled — see docs/agents.md |
| MCP tools (via any MCP client) | Tool integrators | 6 tools: route, classify, decide, audit, deploy, optimize |
Same Python engine. Four ways to use it. You pick the one that fits your workflow.
The 90/9/1 design
- 90% of users never touch the CLI. They open
http://localhost:8765, type a task, see the routing, hit Run it ▶. - 9% of users open the Optimize tab, see what they're overspending on, and one-click install a cheaper agent.
- 1% of users write their own YAML agents, publish them, share them.
The dashboard is the product. The YAML is the protocol. The CLI is the power tool.
How it works (under the hood)
- Task arrives — text in the CLI, the web, or via MCP
- TaskClassifier — 10 categories, 5 complexities, detects vision / function-calling
- AgentRegistry — finds the right agent (project > user > builtin, YAML-defined)
- SmartRouter — picks the right model given the agent's constraints + your budget
- AdaptiveRouter — combines all of the above into a single
RoutingDecision - LLM call — local first, cloud only if budget + quality demand it
- CostTracker — logs the spend, feeds back into the next routing decision
- Guardian++ — pre / during / post audit on every action (secret scan, command sandbox, commit verification)
414 unit tests + 13 integration tests validate every step.
Quickstart (60 seconds)
git clone https://github.com/landrover1984/omniagent
cd omniagent
pip install -e .
omniagent web
# open http://localhost:8765
Or use the CLI directly:
omniagent agent-route "review this code for security" --budget 0.10
omniagent agent-list # see all available agents
omniagent agent-install ./my-agent.yaml # add your own
omniagent optimize # find cheaper routes
omniagent post-mortem # weekly AI intelligence
omniagent agent-decide "design a cache" # see the routing (no LLM call)
Zero API keys needed to start. Local models via Ollama work out of the box.
What ships today
| Layer | Status | Tests |
|---|---|---|
| Agent Protocol (YAML agents) | Shipped | 18 |
| Task Classifier (10 categories) | Shipped | 20 |
| AdaptiveRouter (the brain) | Shipped | 8 |
| 5-tab Web UI + Post-Mortem API | Shipped | 16 endpoints |
| Cost Optimizer (the killer feature) | Shipped | 3 |
| Post-Mortem (Weekly AI Intelligence) | Shipped v0.2.0 | 47 |
| Anti-Hallucination Audit (Guardian++) | Shipped | 23 |
| Hybrid Deploy (local / VPS / AWS) | Shipped | 28 |
| MCP Server (6 tools) | Shipped | 18 |
| Private Fleet (multi-node) | Shipped v0.1.4 | 10 |
| CLI commands | 25+ | 70+ |
| Total | 414 passing, 2 skipped |
Roadmap
| Phase | Theme | Status |
|---|---|---|
| v0.1.x | AI Infrastructure OS — routing, cost, optimize, local-first | Shipped |
| v0.2.0 | Weekly AI Intelligence — persistent cost ledger, post-mortem reports, savings opportunities | Shipped |
| v0.2.x | Agent Generator — code → custom YAML agents | Next |
| v0.3.x | AI Firewall — privacy, PII detection, compliance mode | Planned |
| v0.4.x | Visual Dashboard — real-time cost graphs, agent analytics, team view | Planned |
| v0.5.x | Distributed Compute — idle GPU federation, opt-in mesh | Deferred |
| v0.6.x | Marketplace + Incentives — community YAMLs, reputation, rewards | Deferred |
We are not building another "AI wrapper". We are building the coordination layer that the entire AI ecosystem needs.
Distributed compute and marketplace are real, but they're not the wedge. The wedge is: stop overspending on AI. Get that right first.
License
MIT — 100% open source, forever. No paid tier, no "enterprise edition", no bait-and-switch.
The models will change. The hardware will change. The coordination layer is permanent.
Star on GitHub · Try the Cost Calculator · Write your first agent
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omniagent_fleet-0.2.0.tar.gz.
File metadata
- Download URL: omniagent_fleet-0.2.0.tar.gz
- Upload date:
- Size: 136.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fcd867fc9cdb32938b6d33c9772ec331be9aba4fee8f602d0d7551a567c7ae1
|
|
| MD5 |
df545d8bfed8e368dafb67142ee57f75
|
|
| BLAKE2b-256 |
d77aaff1f42056dd0b3465c33353eb3f4310c91267776e643a6a1d024da2a5a3
|
File details
Details for the file omniagent_fleet-0.2.0-py3-none-any.whl.
File metadata
- Download URL: omniagent_fleet-0.2.0-py3-none-any.whl
- Upload date:
- Size: 141.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f1fa0dab3153e91f29deeba4c5e24607f241dad978d662473f47ac0eb6796e1
|
|
| MD5 |
27d26799471b3edc3dfec02a39901042
|
|
| BLAKE2b-256 |
eaf2f125eb7d11cf056fea0dcdadcb4574195046a1703f41931f9c10e9980282
|