Skip to main content

One endpoint, every free search API. Search broker for AI agents with automatic fallback, RRF ranking, and budget enforcement. The LiteLLM of web search.

Project description

Argus

Python 3.11+ PyPI Version PyPI Downloads License: MIT CI MCP Registry Docker

Search infrastructure for AI agents. Zero-config start, 14 providers, computed answers via WolframAlpha, and a 9-step content extraction chain — so your agent gets actual text, not just URLs.

Features at a glance:

  • 14 providers, one API — free-first tier routing, budget-exhausted providers skipped automatically
  • Zero-key startpip install argus-search gives you DuckDuckGo + Yahoo immediately, no accounts needed
  • SearXNG self-host = 70+ engines — Google, Bing, Yahoo, Startpage, Ecosia, Qwant and more via one Docker container
  • WolframAlpha integration — computed answers for math, unit conversions, and factual queries (2,000 free calls/month)
  • 9-step content extraction — returns full page text with quality gates, not just links
  • Multi-turn sessions — pass session_id for conversational context across searches
  • 4 search modes — discovery, research, recovery, grounding
  • Dead URL recovery/recover-url with Wayback Machine and archive fallbacks
  • 4 integration paths — HTTP API, CLI, MCP server, Python SDK

Built for AI agent builders, RAG pipelines, and ops teams who need reliable search without stitching APIs together.

Contents

Quickstart

Mode 1: Local CLI (zero config)

pip install argus-search && argus search -q "python web frameworks"

That's it. DuckDuckGo handles the search — no accounts, no keys, no containers. You get unlimited free search from your laptop right now. Add API keys whenever you want more providers, or don't.

argus extract -u "https://example.com/article"       # extract clean text from any URL

Works on any machine with Python 3.11+ — laptop, Mac Mini, Raspberry Pi, cloud VM. Nothing to host.

For MCP (Claude Code, Cursor, VS Code):

pipx install argus-search[mcp] && argus mcp serve

Then add to your MCP config:

{"mcpServers": {"argus": {"command": "argus", "args": ["mcp", "serve"]}}}

Or install from the MCP Registry:

{
  "mcpServers": {
    "argus": {
      "registryType": "pypi",
      "identifier": "argus-search",
      "runtimeHint": "uvx"
    }
  }
}

One command to install, one JSON block to connect. No server to run, no keys to configure.

Mode 2: Full Stack Server

Got a Raspberry Pi running Pi-hole? A Mac Mini on your desk? An old laptop? That's enough to run the full stack — SearXNG (your own private search engine) plus local JS-rendering content extraction.

docker compose up -d    # SearXNG + Argus
What you have What you get
Any machine with Python 3.11+ DuckDuckGo + API providers (no server)
Raspberry Pi 4 / old laptop (4GB+) Everything — SearXNG, all providers, Crawl4AI
Mac Mini M1+ (8GB+) Full stack with headroom
Free cloud VM (1GB) SearXNG + search providers (skip Crawl4AI)

SearXNG takes 512MB of RAM and gives you a private Google-style search engine that nobody can rate-limit, block, or charge for. It runs alongside Pi-hole on hardware millions of people already own.

Providers

Provider Credit type Free capacity Setup
DuckDuckGo Free (scraped) Unlimited None
Yahoo Free (scraped) Unlimited None — fragile, auto-skipped if broken
SearXNG Free (self-hosted) Unlimited — 70+ engines¹ Docker
GitHub Free (API) Unlimited None (token for higher rate limit)
WolframAlpha Free (API key) 2,000 queries/month free key
Brave Search Monthly recurring 2,000 queries/month dashboard
Tavily Monthly recurring 1,000 queries/month signup
Exa Monthly recurring 1,000 queries/month signup
Linkup Monthly recurring 1,000 queries/month signup
Serper One-time signup 2,500 credits signup
Parallel AI One-time signup 4,000 credits signup
You.com One-time signup $20 credit platform
Valyu One-time signup $10 credit platform

¹ SearXNG aggregates Google, Bing, Yahoo, Startpage, Ecosia, Qwant, Wikipedia, and 60+ more — all behind a single self-hosted endpoint. Run docker compose up -d on any machine with 512MB of free RAM.

7,000+ free queries/month from recurring free-tier providers alone (WolframAlpha 2k + Brave 2k + Tavily 1k + Exa 1k + Linkup 1k). DuckDuckGo, Yahoo, SearXNG, and GitHub have no monthly cap. Routing priority: Tier 0 (free: SearXNG, DuckDuckGo, Yahoo, GitHub, WolframAlpha) → Tier 1 (monthly: Brave, Tavily, Exa, Linkup) → Tier 2 (one-time: Serper, Parallel, You.com, Valyu, SearchAPI). Budget-exhausted providers are skipped automatically.

HTTP API

All endpoints prefixed with /api. OpenAPI docs at http://localhost:8000/docs.

# Search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "python web frameworks", "mode": "discovery", "max_results": 5}'

# Multi-turn search (conversational refinement)
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "what about async?", "session_id": "my-session"}'

# Extract content from a working URL
curl -X POST http://localhost:8000/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Recover a dead or moved URL
curl -X POST http://localhost:8000/api/recover-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/old-page", "title": "Example Article"}'

# Health & budgets
curl http://localhost:8000/api/health/detail
curl http://localhost:8000/api/budgets

Search modes

Mode Use for Example
discovery Related pages, canonical sources "Find the official docs for X"
research Broad exploratory retrieval "Latest approaches to Y?"
recovery Finding moved/dead content "This URL is 404"
grounding Fact-checking with live sources "Verify this claim about Z"

Tier-based routing always applies first. Within each tier, the mode selects provider order.

Response format

{
  "query": "python web frameworks",
  "mode": "discovery",
  "results": [
    {"url": "https://fastapi.tiangolo.com", "title": "FastAPI", "snippet": "Modern Python web framework", "score": 0.942}
  ],
  "total_results": 1,
  "cached": false,
  "traces": [
    {"provider": "duckduckgo", "status": "success", "results_count": 5, "latency_ms": 312}
  ]
}

Each result includes url, title, snippet, domain, provider, and score. The traces array shows which providers were called and their outcomes.

Budgets

{
  "budgets": {
    "brave": {"remaining": 1847, "monthly_usage": 153, "usage_count": 153, "exhausted": false},
    "duckduckgo": {"remaining": 0, "monthly_usage": 0, "usage_count": 42, "exhausted": false}
  },
  "token_balances": {"jina": 9833638}
}

Each provider tracks usage per calendar month. When a provider hits its budget, Argus skips it and moves to the next tier. Free providers (SearXNG, DuckDuckGo, GitHub) have no limit. Set ARGUS_*_MONTHLY_BUDGET_USD to enforce custom limits per provider.

Integration

CLI

argus search -q "python web framework"              # zero-config, uses DuckDuckGo
argus search -q "python web framework" --mode research -n 20
argus search -q "fastapi" --session my-session       # multi-turn context
argus extract -u "https://example.com/article"       # extract clean text
argus extract -u "https://example.com/article" -d nytimes.com  # auth extraction
argus recover-url -u "https://dead.link" -t "Title"
argus health                                         # provider status
argus budgets                                        # budget + token balances
argus set-balance -s jina -b 9833638                 # track token balance
argus test-provider -p brave                         # smoke-test a provider
argus serve                                          # start API server
argus mcp serve                                      # start MCP server

All commands support --json for structured output.

How sessions work

Pass session_id to any search call. Argus stores each query and extracted URL in a SQLite-backed session. Reusing the same session_id gives the broker context from prior queries — follow-up searches are automatically refined using earlier conversation context. Sessions persist across restarts. Omit session_id for stateless, one-shot searches.

MCP

Add to your MCP client config:

{
  "mcpServers": {
    "argus": {
      "command": "argus",
      "args": ["mcp", "serve"]
    }
  }
}

Works with Claude Code, Cursor, VS Code, and any MCP-compatible client.

Option B — Self-hosted server (homelab / always-on machine)

Run Argus once on a server, connect every client to it over the network. No local install needed on client machines.

On the server (docker compose up -d starts both):

argus mcp serve --transport sse --host 0.0.0.0 --port 8001

On each client machine, add to ~/.claude/claude_desktop_config.json (or equivalent):

{
  "mcpServers": {
    "argus": {
      "url": "http://<your-server>:8271/sse"
    }
  }
}

With Tailscale, <your-server> is your machine's Tailscale hostname (e.g. homelab-ts). One server, every machine on your network gets search.

Available tools: search_web, extract_content, recover_url, expand_links, search_health, search_budgets, test_provider, cookie_health, valyu_answer

Python

from argus.broker.router import create_broker
from argus.models import SearchQuery, SearchMode
from argus.extraction import extract_url

broker = create_broker()

response = await broker.search(
    SearchQuery(query="python web frameworks", mode=SearchMode.DISCOVERY, max_results=10)
)
for r in response.results:
    print(f"{r.title}: {r.url} (score: {r.score:.3f})")

content = await extract_url(response.results[0].url)
print(content.title)
print(content.text)

Content Extraction

Argus tries up to nine methods to extract content from any URL: first local (trafilatura, Crawl4AI, Playwright), then external APIs (Jina, Valyu Contents, Firecrawl, You.com, Wayback, archive.is). Each attempt is quality-checked for garbage output. See docs/providers.md for the full extractor comparison.

Extract gets the full text of a working URL. Recover-URL finds alternatives when a URL is dead, paywalled, or radically changed by querying archival sources (Wayback, archive.is) and running a question-guided extraction loop.

Architecture

Caller (CLI/HTTP/MCP/Python) → SearchBroker → tier-sorted providers → RRF ranking → response
                                     ↕ SessionStore (optional)
                            Extractor (on demand) → 9-step fallback chain with quality gates
Module Responsibility
argus/broker/ Tier-based routing, ranking, dedup, caching, health, budgets
argus/providers/ Provider adapters (one per search API)
argus/extraction/ 9-step URL extraction fallback chain with quality gates
argus/sessions/ Multi-turn session store and query refinement
argus/api/ FastAPI HTTP endpoints
argus/cli/ Click CLI commands
argus/mcp/ MCP server for LLM integration
argus/persistence/ PostgreSQL query/result storage

Add new providers or extractors with a single adapter file. See CONTRIBUTING.md for the interface.

Configuration

All config via environment variables. See .env.example for the full list. Missing keys degrade gracefully — providers are skipped, not errors.

Variable Default Description
ARGUS_SEARXNG_BASE_URL http://127.0.0.1:8080 SearXNG endpoint
ARGUS_BRAVE_API_KEY Brave Search API key
ARGUS_SERPER_API_KEY Serper API key
ARGUS_TAVILY_API_KEY Tavily API key
ARGUS_EXA_API_KEY Exa API key
ARGUS_LINKUP_API_KEY Linkup API key
ARGUS_PARALLEL_API_KEY Parallel AI API key
ARGUS_YOU_API_KEY You.com API key
ARGUS_VALYU_API_KEY Valyu API key (search, contents, answer)
ARGUS_FIRECRAWL_API_KEY Firecrawl API key (content extraction)
ARGUS_GITHUB_API_KEY GitHub token (higher rate limit)
ARGUS_*_MONTHLY_BUDGET_USD 0 (unlimited) Query-count budget per provider
ARGUS_CRAWL4AI_ENABLED false Enable Crawl4AI extraction step
ARGUS_YOU_CONTENTS_ENABLED false Enable You.com Contents API extraction
ARGUS_CACHE_TTL_HOURS 168 Result cache TTL

FAQ

How is this different from calling Tavily/Serper directly? Argus calls them for you — plus 9 other providers. You get one ranked, deduplicated result set instead of managing multiple API keys and stitching results together. Free providers are tried first, so you only burn credits when needed.

Can I run only one provider? Yes. Set only the API key for the provider you want. All others are silently skipped. For zero-config, just install and go — DuckDuckGo handles everything with no keys.

Do I need Docker? No. pip install argus-search works immediately on any machine with Python 3.11+. Docker is only needed for SearXNG (self-hosted search) or Crawl4AI (local JS rendering).

License

MIT — see CHANGELOG.md for release history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_search-1.4.0.tar.gz (90.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_search-1.4.0-py3-none-any.whl (102.8 kB view details)

Uploaded Python 3

File details

Details for the file argus_search-1.4.0.tar.gz.

File metadata

  • Download URL: argus_search-1.4.0.tar.gz
  • Upload date:
  • Size: 90.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for argus_search-1.4.0.tar.gz
Algorithm Hash digest
SHA256 52ef6e38f07ed2a17b82c06952ed6b25f9a2ead69f9b0607914a80b03e1f4964
MD5 65d5547b4f522e7fe19b1809e25f3f19
BLAKE2b-256 57d823ebd70d750008e906961e06e8ebe2a3035a1f96bc26dfd1df78e94de754

See more details on using hashes here.

File details

Details for the file argus_search-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: argus_search-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 102.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for argus_search-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 542d45d2a5a1b267b42611bd7203de7c0c9d79c4dbf01b3f85ddd319e584ab9e
MD5 d4b7ccc183da8de65fa9cce95d5ee375
BLAKE2b-256 5dbdd7a36e1da3964879a42133bacd4ec4e2a252ca282ec282a3c003d8afbed9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page