Skip to main content

Search broker with content extraction and multi-turn sessions — routes queries across multiple providers with fallback, ranking, health tracking, and budget enforcement

Project description

Argus

PyPI version PyPI downloads CI Python 3.11+ License: MIT MCP Server Zero external DB deps

One endpoint, five search providers. Argus routes queries across SearXNG, Brave, Serper, Tavily, and Exa with automatic fallback, RRF result ranking, health tracking, and budget enforcement. Extract clean text from any URL. Remember prior queries for smarter follow-ups. Zero external database dependencies — SQLite only.

Connect via HTTP, CLI, MCP, or Python import.

Why Argus

Without Argus, every agent that needs web search has to wire up individual provider APIs, handle keys and rate limits for each one, write its own fallback logic, deduplicate results from multiple sources, and build its own content extraction pipeline. Each project reimplements the same glue.

Argus replaces that with one endpoint. You add it to your agent once — the same way you'd add a database client or an LLM API wrapper — and it handles the rest:

  • No provider lock-in — swap Brave for Serper or add Exa without changing your agent code. Missing keys degrade gracefully; providers are skipped, not errors.
  • Automatic fallback — if a provider is down, slow, or over budget, Argus routes to the next best one. Your agent doesn't need retry logic or circuit breakers.
  • Better results than any single provider — Reciprocal Rank Fusion merges results from multiple sources. A URL that appears in both Brave and Serper ranks higher than one that only appears in one.
  • Content extraction built in — found a useful link? Pass the URL to Argus and get clean article text back. Paywall domains get authenticated extraction first (Playwright via remote service), then trafilatura (local, free), then Jina Reader fallback. Cached in memory and SQLite so the same URL is never fetched twice.
  • Multi-turn memory — Argus remembers prior queries in a session. Follow-up searches like "fastapi" after "python web frameworks" get context-enriched automatically.
  • Budget-aware by default — each provider has a generous free tier (Brave: 2k/mo, Serper: 2.5k/mo, Tavily: 1k/mo, Exa: 1k/mo). Argus tracks usage per provider and automatically rotates away from one when its quota is hit. Combined, that's thousands of free searches per month — enough for most personal and development use.

Think of it as the LiteLLM of web search — one API, multiple providers, unified interface.

What Argus Is (and Isn't)

Core — search routing, result normalization, RRF ranking, provider health tracking, budget enforcement, deduplication.

Attached services — content extraction (auth extraction + trafilatura + Jina), multi-turn sessions, MCP server interface.

Not — a web crawler, a full document store, a general agent framework, an answer synthesis engine, or a multi-tenant SaaS. If you need those things, Argus integrates with systems that do them.

What It Does

You pass Argus a search query. It routes to providers in cheap-first order, stops early when the first provider already produced enough useful results, and only falls through when failure, weak output, cooldown, or budget limits justify it. Results are ranked, deduplicated, and returned as one clean list.

Content extraction — Pass a URL and get clean article text back. For paywall domains, authenticated extraction runs first (Playwright on a remote service via Tailscale). Trafilatura (local, fast) tries next, Jina Reader falls back if needed. Results cached in memory and SQLite (168h TTL) — survives restarts.

Multi-turn sessions — Pass a session_id and Argus remembers what you've asked before. Follow-up queries get context-enriched automatically. Sessions persist to SQLite across restarts.

Token balance tracking — Track API credits (Jina, etc.) in SQLite. Balances auto-decrement on extraction. Set via CLI, view via API.

API key auth — Set ARGUS_API_KEY to require authentication on all endpoints (health exempt).

Quick Start

# Install from PyPI
pip install 'argus-search[mcp]'

# Or from source
git clone https://github.com/Khamel83/argus.git && cd argus
python -m venv .venv && source .venv/bin/activate
cp .env.example .env
# Edit .env — at minimum, set one provider API key
pip install -e ".[mcp]"

argus serve

# Verify
curl http://localhost:8000/api/health
# {"status":"ok"}

curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "fastapi tutorial", "mode": "discovery"}'

Next Steps

Provider Setup

All you need is API keys for whichever providers you want. SearXNG is free and runs locally.

Provider Free tier Get a key
SearXNG Unlimited (self-hosted) No key needed — runs locally
Brave Search 2,000 queries/month dashboard
Serper 2,500 queries/month signup
Tavily 1,000 queries/month signup
Exa 1,000 queries/month signup

Free tier limits and verification dates: docs/providers.md.

Set keys in .env:

ARGUS_BRAVE_API_KEY=BSA...
ARGUS_SERPER_API_KEY=abc...
ARGUS_TAVILY_API_KEY=tvly-...
ARGUS_EXA_API_KEY=...

Unset or blank keys are silently skipped. You can run Argus with just SearXNG and no paid keys at all.

See docs/providers.md for SearXNG tuning details.

Integration

HTTP API

All endpoints prefixed with /api. OpenAPI docs at /docs. Set ARGUS_API_KEY to require auth (health exempt).

# Search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "python web frameworks", "mode": "discovery", "max_results": 5}'

# Multi-turn search
curl -X POST http://localhost:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "python web frameworks", "session_id": "my-session"}'

# Extract content from a URL
curl -X POST http://localhost:8000/api/extract \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

# Recover a dead URL
curl -X POST http://localhost:8000/api/recover-url \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/old-page", "title": "Example Article"}'

# Expand a query with related links
curl -X POST http://localhost:8000/api/expand \
  -H "Content-Type: application/json" \
  -d '{"query": "fastapi", "context": "python web framework"}'

# Health & budgets
curl http://localhost:8000/api/health
curl http://localhost:8000/api/budgets

CLI

argus search -q "python web framework"
argus search -q "python web framework" --mode research -n 20
argus search -q "fastapi" --session my-session
argus extract -u "https://example.com/article"
argus recover-url -u "https://dead.link" -t "Title"
argus health
argus budgets
argus set-balance -s jina -b 9833638
argus test-provider -p brave
# Provider admin
argus provider disable brave --reason "over budget"
argus provider enable brave
argus provider reset-health brave
# Session management
argus session delete my-session
argus serve
argus mcp serve

MCP

Claude Code — add to your MCP settings:

{
  "mcpServers": {
    "argus": {
      "command": "argus",
      "args": ["mcp", "serve"]
    }
  }
}

Available tools: search_web, extract_content, recover_url, expand_links, search_health, search_budgets, test_provider

Python

from argus.broker.router import create_broker
from argus.models import SearchQuery, SearchMode
from argus.extraction import extract_url

broker = create_broker()

response = await broker.search(
    SearchQuery(query="python web frameworks", mode=SearchMode.DISCOVERY)
)
for r in response.results:
    print(f"{r.title}: {r.url} (score: {r.score:.3f})")

content = await extract_url(response.results[0].url)
print(content.text)

Search Modes

Mode When to use Provider chain
discovery Related pages, canonical sources searxng → brave → exa → tavily → serper
recovery Dead/moved URL recovery searxng → brave → serper → tavily → exa
grounding Fact-checking with few sources brave → serper → searxng
research Broad exploratory retrieval tavily → exa → brave → serper

See docs/search-modes.md for detailed guidance on choosing a mode.

Architecture

Caller (CLI / HTTP / MCP / Python)
  → SearchBroker
    → routing policy (per mode)
    → provider executor (cheap-first, bounded fallback)
    → result pipeline (cache → dedupe → RRF ranking → response)
    → persistence gateway (SQLite, non-fatal)
  → SessionStore (optional, per-request)
  → ContentExtractor (on demand)
    → auth extraction (paywall domains, remote Playwright service)
    → trafilatura (primary) → Jina Reader (fallback)
    → cache: memory → SQLite
Module Responsibility
argus/core/ Generic TTLCache, SlidingWindowLimiter
argus/broker/ Routing, ranking, dedup, health, budgets
argus/providers/ Provider adapters (SearXNG, Brave, Serper, Tavily, Exa)
argus/extraction/ URL content extraction (auth extraction, trafilatura, Jina)
argus/sessions/ Multi-turn session store
argus/api/ FastAPI HTTP endpoints + auth + rate limiting
argus/cli/ Click CLI commands
argus/mcp/ MCP server for LLM integration
argus/persistence/ SQLite search history

Configuration

All config via environment variables. See .env.example.

Variable Default Description
ARGUS_DB_PATH argus.db Unified SQLite database (search, budgets, sessions, extraction cache)
ARGUS_API_KEY Require API key on all endpoints (health exempt)
ARGUS_SEARXNG_BASE_URL http://127.0.0.1:8080 SearXNG endpoint
ARGUS_BRAVE_API_KEY Brave Search API key
ARGUS_SERPER_API_KEY Serper API key
ARGUS_TAVILY_API_KEY Tavily API key
ARGUS_EXA_API_KEY Exa API key
ARGUS_CACHE_TTL_HOURS 168 Result cache TTL
ARGUS_JINA_API_KEY Jina Reader key (optional)
ARGUS_REMOTE_EXTRACT_URL Remote auth extraction service URL (enables Playwright-based extraction for paywall domains)
ARGUS_REMOTE_EXTRACT_KEY API key for the remote extraction service
ARGUS_REMOTE_EXTRACT_TIMEOUT 35 Timeout (seconds) for remote extraction requests
ARGUS_EXTRACTION_CACHE_TTL_HOURS 168 Extraction cache TTL
ARGUS_RATE_LIMIT 60 Requests per window per client IP
ARGUS_RATE_LIMIT_WINDOW 60 Rate limit window (seconds)
ARGUS_CORS_ORIGINS * Allowed CORS origins (comma-separated, * allows all)

Non-Goals

Argus deliberately does not:

  • Crawl or spider — it queries search APIs, not raw URLs. If you need a crawler, use one and feed the URLs to argus extract.
  • Drive a browser locally — Playwright runs on a separate machine (Mac Mini) accessed via Tailscale, not embedded in the Argus process. If you need local browser automation, that's outside scope.
  • Store documents — results are ranked and returned. There is no document index or vector store built in.
  • Synthesize answers — Argus returns search results and extracted text. Answer generation is left to the LLM calling Argus.
  • Run multi-tenant — no user accounts, no per-user quotas, no auth beyond a single ARGUS_API_KEY. Designed for private/single-user deployments.
  • Replace provider SDKs — the broker is a routing layer, not a complete API wrapper. Provider-specific features (image search, news search, etc.) are out of scope.
  • Per-request audit logging — health tracking records provider success/failure, but individual request traces are not persisted. The ARGUS_LOG_FULL_RESULTS flag dumps everything at debug level if needed, but there is no structured audit trail. Designed for personal use where debugging is ad-hoc.
  • Handle high concurrency — SQLite with WAL mode handles moderate load fine, but there is no connection pooling. Under heavy concurrent access (many simultaneous MCP clients), you may see database is locked errors. Designed for single-user or small-team deployments.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_search-1.0.0.tar.gz (73.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_search-1.0.0-py3-none-any.whl (71.2 kB view details)

Uploaded Python 3

File details

Details for the file argus_search-1.0.0.tar.gz.

File metadata

  • Download URL: argus_search-1.0.0.tar.gz
  • Upload date:
  • Size: 73.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for argus_search-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5b64800ea88b68616e091ace107a624b177450aebcae1410266c35bd47eca8c1
MD5 74a9f0443e115255760c481bb56e5102
BLAKE2b-256 898cecfd570d4664339d067abc0136a7e0dcd8ab8353120bf6e2d26543132497

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_search-1.0.0.tar.gz:

Publisher: publish.yml on Khamel83/argus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file argus_search-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: argus_search-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 71.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for argus_search-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91c657d978d99b4e11cb3c0695e7197b1fb3aac7d2dc46eb2c3cc2a64ee0dff3
MD5 3f05cdd0752de064a8db17ef246f8d15
BLAKE2b-256 914c04cdb8d4bee30c76762625146e8b96c5b8d9011b20a48140b429a962d847

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_search-1.0.0-py3-none-any.whl:

Publisher: publish.yml on Khamel83/argus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page