Search broker with content extraction and multi-turn sessions — routes queries across 9 providers with tier-based credit-aware routing, fallback, ranking, health tracking, and budget enforcement
Project description
Argus
Stop wiring search APIs into every project. Argus is one endpoint that talks to 9 search providers — with tier-based credit-aware routing, automatic fallback, result ranking, health tracking, and budget enforcement. Connect via HTTP, CLI, MCP, or Python import. Add a provider key, it works. Remove it, it degrades gracefully.
Search → Extract → Answer. Argus doesn't just find URLs — it can fetch and extract clean text from any page using an 8-step fallback chain, and it remembers your prior queries so follow-up searches get smarter.
What It Does
You pass Argus a search query. It routes to providers in tier order — free/unlimited first (SearXNG), then monthly recurring credits (Brave, Tavily, Linkup, Exa), then one-time credits (Serper, Parallel, You.com) — stopping early when enough useful results are found. Budget-exhausted providers are skipped automatically. Results are ranked, deduplicated, and returned as one clean list.
Tier-based credit routing — Providers are sorted by credit type: Tier 0 (free, unlimited) → Tier 1 (monthly recurring) → Tier 3 (one-time credits). Query-type routing is preserved within each tier — e.g., in research mode, Tavily and Exa still go before Brave within the monthly tier. Budget enforcement tracks query counts per provider on a 30-day rolling window. When credits run out, the provider is skipped until they refresh.
Content extraction — 8-step fallback chain with quality gates: trafilatura → Crawl4AI → Playwright → Jina Reader → You.com Contents → Wayback Machine → archive.is. Each result is checked for paywall stubs, soft 404s, and minimum quality before moving on. SSRF protection blocks private IPs. Results are cached in memory (168h TTL). Authenticated extraction via cookies is supported for paywall domains.
Multi-turn sessions — Pass a session_id with your searches and Argus remembers what you've asked before. Follow-up queries get context-enriched automatically. Sessions persist to SQLite across restarts.
Token balance tracking — Track remaining API credits in a local SQLite database. Balances auto-decrement as you extract content. Set balances via CLI, view via API or argus budgets.
Quick Start
Docker (recommended)
# 1. Create .env with your provider keys
cp .env.example .env
# Edit .env — at minimum, set provider API keys
# 2. Start Argus + Postgres + SearXNG
docker compose up -d
# 3. Verify
curl http://localhost:8000/api/health
curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "fastapi tutorial", "mode": "discovery"}'
Local install
# From PyPI
pip install "argus-search[mcp]"
# With Crawl4AI (self-hosted JS rendering extractor)
pip install "argus-search[mcp,crawl4ai]"
# Or from source
git clone https://github.com/Khamel83/argus.git && cd argus
python -m venv .venv && source .venv/bin/activate
cp .env.example .env
pip install -e ".[mcp]"
argus serve
Note: The PyPI package is
argus-search(the nameargusis taken). The CLI command is stillargus.
Providers
9 providers across 3 credit tiers. SearXNG is free and unlimited — everything else has generous free tiers.
Search Providers
| Provider | Tier | Free tier | API |
|---|---|---|---|
| SearXNG | 0 (free) | Unlimited (self-hosted) | No key needed |
| Brave Search | 1 (monthly) | 2,000 queries/month | dashboard |
| Tavily | 1 (monthly) | 1,000 queries/month | signup |
| Exa | 1 (monthly) | 1,000 queries/month | signup |
| Linkup | 1 (monthly) | 1,000 standard queries/month | signup |
| Serper | 3 (one-time) | 2,500 credits (signup) | signup |
| Parallel AI | 3 (one-time) | 16,000 credits (signup) | signup |
| You.com | 3 (one-time) | $100 credit on signup | platform |
| SearchAPI | 3 (one-time) | Placeholder | Not yet configured |
Content Extractors
| Extractor | Type | Cost | Notes |
|---|---|---|---|
| trafilatura | Local | Free | Primary, fast, no API |
| Crawl4AI | Local | Free | JS rendering, needs crawl4ai package |
| Playwright | Local | Free | Headless browser fallback |
| Jina Reader | API | Token-based | External fallback |
| You.com Contents | API | $1/1k pages | Uses You.com search key |
| Wayback Machine | External | Free | Dead page recovery |
| archive.is | External | Free | Dead page recovery |
Set keys in .env:
ARGUS_BRAVE_API_KEY=BSA...
ARGUS_SERPER_API_KEY=abc...
ARGUS_TAVILY_API_KEY=tvly-...
ARGUS_EXA_API_KEY=...
ARGUS_LINKUP_API_KEY=...
ARGUS_PARALLEL_API_KEY=...
ARGUS_YOU_API_KEY=...
Unset or blank keys are silently skipped. You can run Argus with just SearXNG and no paid keys at all.
SearXNG Setup
The included docker-compose.yml starts SearXNG automatically. If running separately:
docker run -d --name searxng -p 8080:8080 searxng/searxng:latest
curl http://localhost:8080/search?q=test\&format=json
How Routing Works
Providers are selected by two factors: credit tier (primary) and query type (secondary).
┌──────────────────────────────────────────────┐
│ Tier 0: FREE (SearXNG) │ ← always first
├──────────────────────────────────────────────┤
│ Tier 1: MONTHLY RECURRING │
│ Brave · Tavily · Exa · Linkup │ ← mode-specific order within tier
├──────────────────────────────────────────────┤
│ Tier 3: ONE-TIME CREDITS │
│ Serper · Parallel · You.com · SearchAPI │ ← last resort, budget-enforced
└──────────────────────────────────────────────┘
When a provider's monthly budget is exhausted, it's skipped until the 30-day rolling window resets. Budgets are query-count based, set per provider in .env.
Integration
HTTP API
All endpoints prefixed with /api. OpenAPI docs at http://localhost:8000/docs.
# Search
curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "python web frameworks", "mode": "discovery", "max_results": 5}'
# Multi-turn search (session context)
curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "python web frameworks", "session_id": "my-session"}'
# Extract content from a URL
curl -X POST http://localhost:8000/api/extract \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article"}'
# Recover a dead URL
curl -X POST http://localhost:8000/api/recover-url \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/old-page", "title": "Example Article"}'
# Expand a query with related links
curl -X POST http://localhost:8000/api/expand \
-H "Content-Type: application/json" \
-d '{"query": "fastapi", "context": "python web framework"}'
# Health & budgets
curl http://localhost:8000/api/health/detail
curl http://localhost:8000/api/budgets
CLI
argus search -q "python web framework"
argus search -q "python web framework" --mode research -n 20
argus search -q "fastapi" --session my-session # multi-turn context
argus extract -u "https://example.com/article" # extract clean text
argus extract -u "https://example.com/article" -d nytimes.com # authenticated extraction
argus cookies import # import browser cookies
argus cookies health # check cookie freshness
argus recover-url -u "https://dead.link" -t "Title"
argus health # provider status
argus budgets # budget status + token balances
argus set-balance -s jina -b 9833638 # track token balance
argus test-provider -p brave # smoke-test a provider
argus serve # start API server
argus mcp serve # start MCP server
All commands support --json for structured output.
MCP
Add to your MCP client config:
{
"mcpServers": {
"argus": {
"command": "argus",
"args": ["mcp", "serve"]
}
}
}
Works with Claude Code, Cursor, VS Code, and any MCP-compatible client. For remote access via SSE:
{
"mcpServers": {
"argus": {
"command": "argus",
"args": ["mcp", "serve", "--transport", "sse", "--host", "127.0.0.1", "--port", "8001"]
}
}
}
Available tools: search_web, extract_content, recover_url, expand_links, search_health, search_budgets, test_provider, cookie_health
Python
from argus.broker.router import create_broker
from argus.models import SearchQuery, SearchMode
from argus.extraction import extract_url
broker = create_broker()
# Search
response = await broker.search(
SearchQuery(query="python web frameworks", mode=SearchMode.DISCOVERY, max_results=10)
)
for r in response.results:
print(f"{r.title}: {r.url} (score: {r.score:.3f})")
# Extract content from a result
content = await extract_url(response.results[0].url)
print(content.title)
print(content.text)
Search Modes
| Mode | When to use | Provider order (within tiers) |
|---|---|---|
discovery |
Find related pages, canonical sources | Brave → Exa → Tavily → Linkup → Serper → Parallel → You |
recovery |
Dead/moved URL recovery | Brave → Serper → Tavily → Exa → Linkup → Parallel → You |
grounding |
Few live sources for fact-checking | Brave → Serper → SearXNG → Linkup → Parallel → You |
research |
Broad exploratory retrieval | Tavily → Exa → Brave → SearXNG → Linkup → Serper → Parallel → You |
SearXNG (Tier 0) always leads regardless of mode. Within each tier, mode-specific ordering is preserved.
Architecture
Caller (CLI / HTTP / MCP / Python)
→ SearchBroker
→ routing policy (tier-sorted, mode-specific within tiers)
→ provider executor (budget check → health check → search → early stop)
→ result pipeline (cache → dedupe → RRF ranking → response)
→ SessionStore (optional, per-request)
→ query refinement from prior context
→ Extractor (on demand)
→ SSRF → cache → rate limit → auth → QG →
trafilatura → QG → crawl4ai → QG → playwright → QG →
jina → QG → you_contents → QG → wayback → QG →
archive.is → QG → return best
| Module | Responsibility |
|---|---|
argus/broker/ |
Tier-based routing, ranking, dedup, caching, health, budgets |
argus/providers/ |
9 provider adapters (one per search API) |
argus/extraction/ |
8-step URL extraction fallback chain with quality gates |
argus/sessions/ |
Multi-turn session store and query refinement |
argus/api/ |
FastAPI HTTP endpoints |
argus/cli/ |
Click CLI commands |
argus/mcp/ |
MCP server for LLM integration |
argus/persistence/ |
PostgreSQL query/result storage |
Configuration
All config via environment variables. See .env.example for the full list.
| Variable | Default | Description |
|---|---|---|
ARGUS_DB_URL |
— | PostgreSQL connection string |
ARGUS_SEARXNG_BASE_URL |
http://127.0.0.1:8080 |
SearXNG endpoint |
ARGUS_BRAVE_API_KEY |
— | Brave Search API key |
ARGUS_SERPER_API_KEY |
— | Serper API key |
ARGUS_TAVILY_API_KEY |
— | Tavily API key |
ARGUS_EXA_API_KEY |
— | Exa API key |
ARGUS_LINKUP_API_KEY |
— | Linkup API key |
ARGUS_PARALLEL_API_KEY |
— | Parallel AI API key |
ARGUS_YOU_API_KEY |
— | You.com API key |
ARGUS_*_MONTHLY_BUDGET_USD |
0 (unlimited) | Query-count budget per provider |
ARGUS_CRAWL4AI_ENABLED |
false | Enable Crawl4AI extraction step |
ARGUS_YOU_CONTENTS_ENABLED |
false | Enable You.com Contents API extraction |
ARGUS_CACHE_TTL_HOURS |
168 | Result cache TTL |
ARGUS_JINA_API_KEY |
— | Jina Reader key (optional) |
ARGUS_EXTRACTION_TIMEOUT_SECONDS |
10 | URL fetch timeout for extraction |
ARGUS_EXTRACTION_CACHE_TTL_HOURS |
168 | Extraction cache TTL |
ARGUS_RATE_LIMIT |
60 | Requests per window per client IP |
License
MIT
Publishing
The PyPI package is argus-search (the name argus is taken).
Release checklist
- Bump
versioninpyproject.toml - Commit and push to
main - Build:
python3 -m build - Publish:
PYPI_API_TOKEN=$(secrets get PYPI_API_TOKEN) python3 -m twine upload dist/* - Create GitHub release:
gh release create v<version> --title "v<version>"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_search-1.3.0.tar.gz.
File metadata
- Download URL: argus_search-1.3.0.tar.gz
- Upload date:
- Size: 77.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da86f895244bec4cae6aa241cfa86e0a3430693e55fde58873edbc935c6a379b
|
|
| MD5 |
1850c015de9f2db8376ff7ebe5272f6a
|
|
| BLAKE2b-256 |
06b1966b28e9e96f14fd56c8213362b73812dc29dde739e5d8615834cee578e6
|
File details
Details for the file argus_search-1.3.0-py3-none-any.whl.
File metadata
- Download URL: argus_search-1.3.0-py3-none-any.whl
- Upload date:
- Size: 84.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48dc4c80afb9d7b347aff949f6952063cfc7bd98e26dd1fd494e1430ea694c49
|
|
| MD5 |
5d7929d543505b7bb310ef48e5fd6e18
|
|
| BLAKE2b-256 |
63f0dcc17e90df0986eaff814bceb14b4e3e8a7eeccdd19fcc06263320cff565
|