Skip to main content

AI Infrastructure OS — same engine, four facades (CLI, REST, web, MCP). Reduce 80-95% of AI spend by routing tasks to the right model.

Project description

OmniAgent

You are overspending on AI.

OmniAgent routes every AI task to the most efficient model automatically. Local first. Cloud only when it pays off. 80–97% savings on your AI bill.

Try the AI Cost Calculator · See the 60-second demo · Star on GitHub

MIT License Python 3.11+ Tests Zero Telemetry Local First MIT 100%


The math most devs don't realize

"I just need AI to review my code, write docstrings, and rename things." — every developer with a $500/month Cursor + Claude bill

Here's what you actually need (benchmarked on real hardware, Jun 2026):

Task All-Claude reality OmniAgent Savings
Review a function for bugs Claude · $0.30 qwen2.5-coder:7b (local) · $0.00 100%
Write a Google-style docstring Claude · $0.28 qwen2.5-coder:7b (local) · $0.00 100%
Rename a variable Claude · $0.15 qwen2.5-coder:7b (local) · $0.00 100%
Explain TCP vs UDP Claude · $0.10 qwen2.5-coder:7b (local) · $0.00 100%
Classify a bug ticket Claude · $0.08 qwen2.5-coder:7b (local) · $0.00 100%

Fleet benchmark · MSI desktop (GTX 1650, 4GB VRAM, 8 threads) · 5 tasks · 506 tokens: $0.00 total cloud spend.

OmniAgent uses Claude when Claude is the right tool. It just doesn't use Claude when Qwen can do the same job at 1% the cost.


The 60-second demo

Real benchmark run on MSI desktop (GTX 1650, 4GB VRAM, 8 threads):

→ msi-node: qwen2.5-coder:7b | $0.00 | 30,123ms  (review function)
→ msi-node: qwen2.5-coder:7b | $0.00 | 50,475ms  (write docstring)
→ msi-node: qwen2.5-coder:7b | $0.00 | 10,181ms  (rename variable)
→ msi-node: qwen2.5-coder:7b | $0.00 | 15,321ms  (explain TCP/UDP)
→ msi-node: qwen2.5-coder:7b | $0.00 |  8,649ms  (classify ticket)

Total: 5 tasks · 506 tokens · $0.00 cloud spend · avg 22.95s/task

Every task ran on local GPU. Zero cloud cost. That's what "AI Infrastructure OS" means.


Weekly AI Intelligence — omniagent post-mortem

"You don't need a budget. You need to see what you spent."

v0.2.0 adds the killer first-run experience: a persistent cost ledger + a Weekly AI Intelligence report.

Every omniagent agent-route and omniagent fleet route call is now logged to ~/.omniagent/postmortem/ledger.db. Then run:

omniagent post-mortem                  # last 7 days
omniagent post-mortem --period month   # last 30 days
omniagent post-mortem --period all     # all time
omniagent post-mortem --json | jq      # pipe to tools
omniagent post-mortem -o weekly.md     # save the report
omniagent post-mortem --demo           # inject sample data and see the report

Sample output (with --demo):

# 🧠 Weekly AI Intelligence

_Generated 2026-06-03 · Period: last 7 days_

## 💰 Top-line numbers

| Metric | Value |
|---|---:|
| Tasks run | 10 |
| Tokens (in + out) | 23,770 |
| **Total cost** | **$0.3046** |
| ↳ Local | $0.0000 |
| ↳ Cloud | $0.3046 |
| All-Claude-Sonnet equivalent | $0.1777 |
| **Savings vs all-Claude** | **-71.4%** ⚠️ |

> 💡 You spent $0.3046 on cloud models.
> $0.2894 of that (≈95%) probably could have been local.

## ⚡ Top optimization opportunities

**Total potential savings: $0.0991**

### 1. claude-sonnet-4 → qwen2.5-coder:7b
- Calls: 6 · Tokens: 12,570
- Actual cost: $0.0936 · Could have been: $0.0000
- Savings: **$0.0936** · Risk: low

### 2. gpt-4o → qwen2.5-coder:7b
- Calls: 1 · Tokens: 1,000
- Actual cost: $0.0055 · Could have been: $0.0000
- Savings: **$0.0055** · Risk: low

### 3. claude-opus-4-5 → no alt
- Calls: 1 · Tokens: 5,300
- Actual cost: $0.2055 · Could have been: $0.2055
- Savings: $0.0000 · Risk: high (frontier reasoning)

The "risk" field is honest. Frontier models (Opus, o1) get risk=high and local_alternative=null — you really did need that model. Trivial and simple tasks get risk=low and a concrete local alternative. No misleading savings claims.

Same data is available from the web: GET /api/postmortem?period=week.


Why OmniAgent exists

The AI industry is in an efficiency crisis:

  • 73% of prompts sent to frontier models could be handled by smaller local models
  • Developers burn $500–$1000/month on Cursor + Claude + GPT with no visibility into what each line costs
  • Agents hallucinate APIs, break production code, leak secrets, forget to commit — and you find out at 2 AM
  • Massive energy waste: a single city could run on the daily inference cycles of one frontier API call
  • Lock-in: one IDE, one provider, one pricing tier
  • No coordination between local hardware, cloud APIs, VPS nodes, and the billions of idle GPUs sitting in garages and offices worldwide

The models will keep changing. The hardware will keep evolving. The only permanent problem is: how do you orchestrate all this intelligence efficiently, securely, and cheaply?

That's what OmniAgent solves.


What it is (and what it isn't)

OmniAgent is not a model. Not an agent. Not a chatbot.

OmniAgent is the operating system that coordinates the entire AI ecosystem — models, agents, hardware, costs, and security — so you stop wasting compute, money, and trust.

Think of it as:

  • Linux doesn't create every app, but everything runs on it.
  • Kubernetes doesn't build every container, but it orchestrates them all.
  • Steam doesn't develop every game, but it hosts them.

OmniAgent doesn't compete with OpenAI, Anthropic, DeepSeek, or your favorite open-source model. It makes all of them work together intelligently.


The 4 façades: one engine, four ways to use it

Façade Audience What you get
CLI (omniagent route "task") Developers, power users Full control, scriptable, fits in any pipeline
Web app (omniagent web) Everyone, especially non-devs 5-tab dashboard on http://localhost:8765 — visualize routing, hardware, optimize
YAML agents (*.yaml in ~/.omniagent/agents/) Agent authors, teams Declarative, shareable, version-controlled — see docs/agents.md
MCP tools (via any MCP client) Tool integrators 6 tools: route, classify, decide, audit, deploy, optimize

Same Python engine. Four ways to use it. You pick the one that fits your workflow.


The 90/9/1 design

  • 90% of users never touch the CLI. They open http://localhost:8765, type a task, see the routing, hit Run it ▶.
  • 9% of users open the Optimize tab, see what they're overspending on, and one-click install a cheaper agent.
  • 1% of users write their own YAML agents, publish them, share them.

The dashboard is the product. The YAML is the protocol. The CLI is the power tool.


How it works (under the hood)

  1. Task arrives — text in the CLI, the web, or via MCP
  2. TaskClassifier — 10 categories, 5 complexities, detects vision / function-calling
  3. AgentRegistry — finds the right agent (project > user > builtin, YAML-defined)
  4. SmartRouter — picks the right model given the agent's constraints + your budget
  5. AdaptiveRouter — combines all of the above into a single RoutingDecision
  6. LLM call — local first, cloud only if budget + quality demand it
  7. CostTracker — logs the spend, feeds back into the next routing decision
  8. Guardian++ — pre / during / post audit on every action (secret scan, command sandbox, commit verification)

414 unit tests + 13 integration tests validate every step.


Quickstart (60 seconds)

git clone https://github.com/landrover1984/omniagent
cd omniagent
pip install -e .
omniagent web
# open http://localhost:8765

Or use the CLI directly:

omniagent agent-route "review this code for security" --budget 0.10
omniagent agent-list                     # see all available agents
omniagent agent-install ./my-agent.yaml  # add your own
omniagent optimize                       # find cheaper routes
omniagent post-mortem                    # weekly AI intelligence
omniagent agent-decide "design a cache"  # see the routing (no LLM call)

Zero API keys needed to start. Local models via Ollama work out of the box.


What ships today

Layer Status Tests
Agent Protocol (YAML agents) Shipped 18
Task Classifier (10 categories) Shipped 20
AdaptiveRouter (the brain) Shipped 8
5-tab Web UI + Post-Mortem API Shipped 16 endpoints
Cost Optimizer (the killer feature) Shipped 3
Post-Mortem (Weekly AI Intelligence) Shipped v0.2.0 47
Anti-Hallucination Audit (Guardian++) Shipped 23
Hybrid Deploy (local / VPS / AWS) Shipped 28
MCP Server (6 tools) Shipped 18
Private Fleet (multi-node) Shipped v0.1.4 10
CLI commands 25+ 70+
Total 414 passing, 2 skipped

Roadmap

Phase Theme Status
v0.1.x AI Infrastructure OS — routing, cost, optimize, local-first Shipped
v0.2.0 Weekly AI Intelligence — persistent cost ledger, post-mortem reports, savings opportunities Shipped
v0.2.x Agent Generator — code → custom YAML agents Next
v0.3.x AI Firewall — privacy, PII detection, compliance mode Planned
v0.4.x Visual Dashboard — real-time cost graphs, agent analytics, team view Planned
v0.5.x Distributed Compute — idle GPU federation, opt-in mesh Deferred
v0.6.x Marketplace + Incentives — community YAMLs, reputation, rewards Deferred

We are not building another "AI wrapper". We are building the coordination layer that the entire AI ecosystem needs.

Distributed compute and marketplace are real, but they're not the wedge. The wedge is: stop overspending on AI. Get that right first.


License

MIT — 100% open source, forever. No paid tier, no "enterprise edition", no bait-and-switch.


The models will change. The hardware will change. The coordination layer is permanent.

Star on GitHub · Try the Cost Calculator · Write your first agent

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omniagent_fleet-0.2.0.tar.gz (136.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omniagent_fleet-0.2.0-py3-none-any.whl (141.6 kB view details)

Uploaded Python 3

File details

Details for the file omniagent_fleet-0.2.0.tar.gz.

File metadata

  • Download URL: omniagent_fleet-0.2.0.tar.gz
  • Upload date:
  • Size: 136.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for omniagent_fleet-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5fcd867fc9cdb32938b6d33c9772ec331be9aba4fee8f602d0d7551a567c7ae1
MD5 df545d8bfed8e368dafb67142ee57f75
BLAKE2b-256 d77aaff1f42056dd0b3465c33353eb3f4310c91267776e643a6a1d024da2a5a3

See more details on using hashes here.

File details

Details for the file omniagent_fleet-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for omniagent_fleet-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f1fa0dab3153e91f29deeba4c5e24607f241dad978d662473f47ac0eb6796e1
MD5 27d26799471b3edc3dfec02a39901042
BLAKE2b-256 eaf2f125eb7d11cf056fea0dcdadcb4574195046a1703f41931f9c10e9980282

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page