Skip to main content

Give any AI agent the ability to search, crawl, and extract the web.

Project description

Agent Search

Give any AI agent the ability to search, crawl, and extract the web.

MIT License Python 3.9+ PyPI


Agent Search is a CLI and Python library that gives AI agents reliable web access. One command to search, crawl websites, extract structured data, and monitor pages for changes — all routed through a 4-layer proxy chain that automatically handles IP rotation, CAPTCHA detection, and rate limiting.

pip install agentsearch
search "latest NVIDIA earnings" --format json

Why Agent Search?

Most AI agents can't reliably access the web. Search APIs are expensive, direct requests get blocked, and scraping requires infrastructure. Agent Search solves this:

  • Multi-engine search — Aggregates results from Google, DuckDuckGo, Bing, and Wikipedia. Deduplicates and ranks by relevance.
  • 4-layer proxy chain — Automatic failover: MacBook relay -> NordVPN SOCKS5 -> AWS API Gateway IP rotation -> direct. Never get blocked.
  • Headless browsing — Playwright with stealth mode for JavaScript-rendered pages.
  • Structured extraction — Pull data from any page using CSS selectors, XPath, or LLM-powered extraction.
  • Change monitoring — Watch any URL for content changes with configurable intervals.
  • Community proxy pool — Earn credits by sharing bandwidth. Spend credits to use the network.

Quick Start

# Install
pip install agentsearch

# First run — creates account and gets API key
search

# Search the web
search "Python asyncio documentation"

# Output as JSON (for agents)
search query "React hooks tutorial" --format json

# Use headless browser for JS-heavy sites
search query "site:twitter.com AI news" --browser

# Crawl a docs site
search crawl https://docs.python.org --depth 3 --max-pages 100

# Extract structured data
search extract https://shop.com/products --schema schema.json --format json

# Monitor a page for changes (check every 30 min)
search monitor https://example.com/pricing --interval 1800

Installation

# Core (requests-based, no browser)
pip install agentsearch

# With headless browser support
pip install agentsearch[browser]

# From source
git clone https://github.com/r0botsorg/agent-search-cli.git
cd agent-search-cli
pip install -e ".[dev]"

Requirements: Python 3.9+ and an internet connection. Everything else is optional.


Architecture

┌─────────────────────────────────────────────────────┐
│                    CLI / Library                      │
│    search query | crawl | extract | monitor          │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│               Multi-Engine Search                    │
│    Google + DuckDuckGo + Bing + Wikipedia             │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│               4-Layer Proxy Chain                    │
│                                                      │
│    1. MacBook Relay     (residential IP)             │
│    2. NordVPN SOCKS5    (residential IP)             │
│    3. AWS API Gateway   (rotating datacenter IPs)    │
│    4. Direct            (fallback)                   │
│                                                      │
│    Auto-failover · CAPTCHA detection · Rate limiting │
└──────────────────────┬──────────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────────┐
│               Content Processing                     │
│                                                      │
│    HTML → Markdown · CSS/XPath extraction            │
│    LLM extraction · Change detection                 │
│    Playwright stealth · Session management           │
└─────────────────────────────────────────────────────┘

Modes

Mode Cost Proxies Best For
Lite Free Self-managed (your proxies) Developers with existing infrastructure
Pro Paid Fully managed Teams who want zero setup
Pool Free Community-powered Everyone — share bandwidth, earn credits

CLI Reference

Global Options

Option Description
--version Show version and exit
--verbose / -v Enable debug logging
--config PATH Path to custom config file
--skip-onboarding Skip the first-run setup wizard

Search

search "your query"                              # quick search
search query "your query" --format json          # JSON output
search query "your query" --browser              # JS rendering
search query "your query" --extract "h1, .price" # CSS extraction
search query "your query" --pro                  # hosted mode
search query "your query" -o results.json        # save to file

Crawl

search crawl https://docs.example.com --depth 3 --max-pages 100

Extract

search extract https://shop.com/product --schema schema.json --format json

Monitor

search monitor https://example.com/pricing --interval 1800

Proxy Pool

search pool join       # contribute bandwidth, earn credits
search pool leave      # stop participating
search pool status     # your node status
search pool stats      # global network stats
search pool credits    # your balance

Auth

search auth login      # authenticate for Pro mode
search auth logout     # remove stored credentials
search auth status     # check auth state

Command Tree

search [QUERY]
├── query QUERY [--pro] [-f markdown|html|json] [-o PATH] [--extract CSS] [--browser]
├── crawl URL [--pro] [--depth N] [--max-pages N]
├── extract URL [--pro] [--schema PATH] [-f markdown|json]
├── monitor URL [--pro] [--interval N]
├── onboard
├── auth
│   ├── login
│   ├── logout
│   └── status
└── pool
    ├── join
    ├── leave
    ├── status
    ├── stats
    └── credits

13 commands total.


Python Library

Use Agent Search as a library in your own code:

from agent_search.core.proxy_chain import ProxyChain
from agent_search.core.multi_search import MultiEngineSearch
from agent_search.core.html_to_markdown import HTMLToMarkdown
from agent_search.core.data_extraction import DataExtractor
from agent_search.core.change_detector import ChangeDetector

# Proxy-aware HTTP requests with automatic failover
proxy = ProxyChain()
response = proxy.get("https://example.com")
data = await proxy.async_get("https://api.example.com/data")
proxies = proxy.get_best_proxies_dict()  # for use with requests

# Multi-engine search with dedup + ranking
engine = MultiEngineSearch()
results = engine.search("latest AI research", max_results=10)

# HTML to clean Markdown
converter = HTMLToMarkdown()
markdown = converter.convert(html, base_url="https://example.com")

# Structured data extraction
extractor = DataExtractor()
data = extractor.extract(url, selectors=["h1", ".price", ".description"])

# Change monitoring
detector = ChangeDetector()
changed = detector.check(url)  # returns True if content changed

Core Modules

Module Description
proxy_chain 4-layer proxy with automatic failover
multi_search Multi-engine search aggregation with dedup + ranking
html_to_markdown Clean HTML-to-Markdown conversion
data_extraction CSS, XPath, and LLM-powered structured extraction
playwright_browser Headless Chrome with stealth mode
batch_processor Async batch URL processing with concurrency control
change_detector Content change monitoring via SHA-256 snapshots
captcha_detector CAPTCHA and anti-bot block detection
rate_limiter Thread-safe rate limiting with adaptive backoff
retry_handler Exponential backoff with circuit breaker pattern
sitemap_crawler URL discovery via sitemap.xml and robots.txt
aws_ip_rotator AWS API Gateway IP rotation (new IP per request)
nordvpn_proxy NordVPN SOCKS5 residential proxy support
session_manager Persistent session and cookie storage
user_agents 27 real browser User-Agent strings with rotation
llm_extractor LLM-powered intelligent data extraction

Configuration

Config is stored at ~/.config/agent-search/config.json (created on first run via onboarding wizard).

Environment Variables

Variable Description
AGENT_SEARCH_ENDPOINT Search endpoint URL (default: http://localhost:15000)
AGENT_SEARCH_API_KEY Pro mode API key
NORDVPN_SERVICE_USER NordVPN SOCKS5 username
NORDVPN_SERVICE_PASS NordVPN SOCKS5 password
AWS_API_GATEWAY_ID AWS API Gateway ID for IP rotation
AWS_REGION AWS region (default: us-east-1)
MACBOOK_PROXY_URL MacBook relay proxy URL
MACBOOK_API_KEY MacBook relay auth key
OPENAI_API_KEY For LLM-powered extraction
BING_SEARCH_API_KEY Bing Search API key (optional engine)

Project Structure

agent-search-cli/
├── pyproject.toml                # Package config + entry points
├── src/agent_search/
│   ├── cli/                      # CLI layer (Click)
│   │   ├── main.py               # Command routing
│   │   ├── onboarding.py         # First-run setup wizard
│   │   └── commands/
│   │       ├── query.py          # Web search
│   │       ├── crawl.py          # Website crawling
│   │       ├── extract.py        # Data extraction
│   │       ├── monitor.py        # Change monitoring
│   │       ├── auth.py           # Authentication
│   │       └── pool.py           # Proxy pool management
│   ├── core/                     # Core library (usable independently)
│   │   ├── proxy_chain.py        # 4-layer proxy failover
│   │   ├── multi_search.py       # Multi-engine search
│   │   ├── html_to_markdown.py   # HTML → Markdown
│   │   ├── data_extraction.py    # Structured extraction
│   │   ├── playwright_browser.py # Headless browser
│   │   ├── batch_processor.py    # Async batch processing
│   │   ├── change_detector.py    # Change monitoring
│   │   ├── captcha_detector.py   # Anti-bot detection
│   │   ├── rate_limiter.py       # Rate limiting
│   │   ├── retry_handler.py      # Retry + circuit breaker
│   │   ├── sitemap_crawler.py    # Sitemap discovery
│   │   ├── aws_ip_rotator.py     # AWS IP rotation
│   │   ├── nordvpn_proxy.py      # NordVPN SOCKS5
│   │   ├── session_manager.py    # Session persistence
│   │   ├── llm_extractor.py      # LLM extraction
│   │   └── user_agents.py        # UA rotation
│   ├── pool/                     # Proxy pool network
│   └── utils/
│       ├── logger.py
│       └── version.py
└── tests/
    ├── test_*.py
    └── unit/

Development

git clone https://github.com/r0botsorg/agent-search-cli.git
cd agent-search-cli
pip install -e ".[dev]"
python -m pytest tests/ -v

About Qwerty

Agent Search is built by Qwerty (qwert.ai) — an AI-powered search platform designed specifically for agents and autonomous systems.

Traditional search wasn't built for the agent era. It was built for humans typing queries into search boxes. Qwerty is different: an agent-first search infrastructure built from the ground up for the software that's replacing manual workflows.

The Platform

Agent Search CLI is the open-source core of the Qwerty platform. The full stack includes:

Component Description
Agent Search CLI Open-source CLI and Python library (this repo)
Qwerty API Hosted search API at api.qwert.ai — managed proxy infrastructure, no setup required
Proxy Pool Community-powered proxy network — share bandwidth, earn credits

Pricing

Plan Price Requests What You Get
Lite Free 1,000/mo Basic search, API access, community support
Pro $49/mo 50,000/mo Managed proxies, semantic search, priority support, analytics
Enterprise $999/mo Unlimited Dedicated infrastructure, SLA, SSO, custom integrations

Start free at qwert.ai or self-host the entire stack with the open-source repos.

Contact


License

MIT License. See LICENSE for details.


Built by Qwerty

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentsearchcli-1.0.0.tar.gz (86.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentsearchcli-1.0.0-py3-none-any.whl (87.0 kB view details)

Uploaded Python 3

File details

Details for the file agentsearchcli-1.0.0.tar.gz.

File metadata

  • Download URL: agentsearchcli-1.0.0.tar.gz
  • Upload date:
  • Size: 86.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentsearchcli-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ca8669571ebe1bb98610dc6e1724bf2b60514231111d141df1e3fdc890867491
MD5 0a9beed1d62b720293a7c3d0dc06c1c0
BLAKE2b-256 8599b053287ef510617a7e013d74726fd43e28156a88b8a5e3486cfc93c547f7

See more details on using hashes here.

File details

Details for the file agentsearchcli-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: agentsearchcli-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 87.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentsearchcli-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0da5ff9d6258e0dc8d78fe97d78544841edeec3baf4a8fced3b5c1bd1fca40c8
MD5 e7eb18531e59b8c9bd2fdd802ce1d636
BLAKE2b-256 571b29a6f1ea48694cf4434daa247dd1a644d972be9315dc6a5a7644fe6bd75f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page