Skip to main content

Unified CLI over Tavily, Firecrawl, and SerpAPI web-search/content SDKs

Project description

web-search-tool

A unified Typer CLI over the Tavily, Firecrawl, and SerpAPI web-search/content SDKs — one command surface and one normalized output schema for humans, scripts, and agents.

  • One schema, every provider. search and every vertical return the same envelope shape (title, url, snippet, score) plus documented vertical-specific fields, regardless of which provider served the request.
  • A resilient fetch cascade. fetch walks markdownify → tavily → firecrawl, falling through on thin content, hard HTTP errors, and unsupported sites.
  • Agent-friendly. Deterministic exit codes, JSON when piped, an explicit escalation signal on total fetch failure, and byte-bounded/chunked output.

Install

cd web-search-tool
uv sync --extra dev
uv run web-search-tool --version

Configuration

Settings resolve with the precedence flag > env > config file > default.

  • Tool settings use the SEARCH_TOOL_ env prefix (e.g. SEARCH_TOOL_DEFAULT_PROVIDER=tavily, SEARCH_TOOL_TIMEOUT=30).
  • Provider keys use their conventional names: TAVILY_API_KEY, FIRECRAWL_API_KEY, SERPAPI_API_KEY.
  • Set a key to SKIP to opt a provider out of launch validation. Requesting a vertical that only that provider serves (e.g. scholar/jobs with SERPAPI_API_KEY=SKIP) fails with a specific SKIP-conflict error.
  • An optional TOML config lives at ~/.config/web-search-tool/config.toml (override with WEB_SEARCH_TOOL_CONFIG or the --config flag). See config.example.toml for every tunable.
# Minimal: keys in the environment
export SERPAPI_API_KEY=...        # required for the SerpAPI verticals
export TAVILY_API_KEY=...         # search, research, one cascade converter
export FIRECRAWL_API_KEY=SKIP     # opt out if you don't have a Firecrawl key
Setting Env var Default Purpose
Default provider SEARCH_TOOL_DEFAULT_PROVIDER serpapi Provider for generic search/resolve
Timeout SEARCH_TOOL_TIMEOUT 20 Per-request timeout (seconds)
Default format SEARCH_TOOL_DEFAULT_FORMAT (auto) table or json; auto-detects TTY when unset
Tavily key TAVILY_API_KEY Tavily search + deep research
Firecrawl key FIRECRAWL_API_KEY Firecrawl search + scrape converter
SerpAPI key SERPAPI_API_KEY All seven SerpAPI verticals
Sentry DSN SENTRY_DSN (off) Enables error tracking + tracing when set
Sentry environment SENTRY_ENVIRONMENT local Deployment tag on Sentry events
Trace sample rate SENTRY_TRACES_SAMPLE_RATE 1.0 Transaction sampling, 0.0–1.0

The Sentry settings also live under a [monitoring] table in the config file and follow the same precedence. See Monitoring.

Output contract

  • stdout carries only the payload. All logs go to stderr; API keys never appear in logs, output, or written files.

  • Format: a TTY renders a rich table; a pipe emits JSON. --format/-f table|json overrides both; --compact removes JSON indentation.

  • Every payload is wrapped in an envelope{command, ok, data} — so callers can branch on ok without parsing the body.

  • Exit codes are deterministic and category-specific:

    Code Name Meaning
    0 SUCCESS Completed
    1 GENERAL Unclassified error
    2 USAGE Bad CLI usage
    3 INPUT Bad input / no URL resolved
    4 NOT_FOUND No results
    5 NETWORK Network/provider failure
    6 TIMEOUT Bounded wait exceeded
    7 CONFIG Missing/SKIP key, bad config

Commands

Search & verticals

web-search-tool search "python typer cli" --limit 10 [--provider tavily|firecrawl|serpapi]
web-search-tool news "us treasury yields" -n 5
web-search-tool jobs "platform engineer" --location "Boston, MA"
web-search-tool images "golden gate bridge"
web-search-tool videos "rust async tutorial"
web-search-tool reverse-image "https://example.com/photo.jpg"
web-search-tool scholar "continual learning llm" \
    --author "Bengio" --min-cites 50 --since 2020 --until 2024 --sort cites

All search commands accept --limit/-n, --format/-f, and --compact. --limit is enforced uniformly — SerpAPI verticals are capped client-side.

Scholar filters (scholar only):

Flag Effect
--author/-a NAME Native author:"NAME" operator
--min-cites N Keep only results cited ≥ N times (client-side over a pool)
--since YEAR / --until YEAR Native publication-year range
--sort relevance|date|cites relevance (default), most-recent, or most-cited

Fetch — URL → markdown via the cascade

web-search-tool fetch "https://example.com/article"        # probe: first success wins
web-search-tool fetch URL --compare                        # all converters, labeled
web-search-tool fetch URL --raw-html                        # local markdownify only
web-search-tool fetch URL --order markdownify,firecrawl     # custom chain
web-search-tool fetch URL --max-bytes 20000 --offset 0      # bounded / chunked window
web-search-tool fetch URL --stdout                          # stream instead of writing a file
  • Default mode is a probe: walk the chain, first success wins; total failure raises chain-exhaustion carrying the exact signal Fetch chain completely failed, try using agent-browser.
  • --max-bytes overflow truncates and appends the marker *OUTPUT TRUNCATED PLS INCREASE THE CAP OR DO CHUNKED REQUEST*; combine with --offset for deterministic chunked continuation.
  • Without --stdout, output is written to a collision-safe slug file (<UTC-timestamp>-<slug>.md) in --out-dir (default: current directory).

Resolve — search, then fetch the top-N URLs

web-search-tool resolve "best rust web framework" --limit 5
web-search-tool resolve "fed rate decision" --vertical news -n 3
web-search-tool resolve "q" --stdout

Runs a search over --vertical (default search), then fetches each result URL through the cascade in auto-resolve mode: a URL that exhausts its chain records an escalation but never aborts the batch. Exits 0 if any URL resolved, 3 (INPUT) only if all failed.

Research — deep cited research via Tavily

web-search-tool research run "history of CRISPR patents"            # returns request_id
web-search-tool research run "q" --file notes.md --file data.json   # attach local context
web-search-tool research run "q" --wait --max-wait 300              # poll to completion
web-search-tool research status <request_id>                        # look up once

research run returns a request_id immediately; --wait polls until a terminal status or --max-wait (then exits TIMEOUT with a resume hint). research status reports status and, on completion, the cited markdown report with numbered sources. --file attaches up to 5 local .txt/.md/.json files as research context.

Monitoring

Sentry is off by default and fully optional — set a DSN to turn it on. When enabled, the tool reports errors and wraps each invocation in a transaction with child spans around every outbound call (provider search, each fetch-cascade tier, research launch/poll), tagged with non-secret data (provider, tier, status, result counts). API key values are never attached to spans, tags, or events.

Configure it via env or the [monitoring] table in the config file (same precedence as everything else):

export SENTRY_DSN="https://<key>@<org>.ingest.sentry.io/<project>"
export SENTRY_ENVIRONMENT=prod          # optional, defaults to "local"
export SENTRY_TRACES_SAMPLE_RATE=0.2    # optional, defaults to 1.0

Verify the setup end to end — emits a span tree plus one synthetic, harmless exception, then prints the captured event id:

web-search-tool test sentry

It exits with the config code (7) when no DSN is configured (nothing to test).

Development

uv run pytest                          # unit tests (live tests skipped without keys)
uv run pytest -m live                  # opt-in tests against real provider APIs
uv run ruff check . && uv run ruff format --check .
uv run mypy --strict src/web_search_tool
uv run pytest --cov=web_search_tool --cov-report=term-missing

See CONTRIBUTING.md for the full workflow.

License

MIT © Vlad Korolev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_search_tool-0.1.0.tar.gz (172.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

web_search_tool-0.1.0-py3-none-any.whl (51.8 kB view details)

Uploaded Python 3

File details

Details for the file web_search_tool-0.1.0.tar.gz.

File metadata

  • Download URL: web_search_tool-0.1.0.tar.gz
  • Upload date:
  • Size: 172.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for web_search_tool-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a7c19dd8eff9f9c4432f17d5a368ddcb07b2c216458c1d231e8a82ccbc4890f4
MD5 b263b85d316b7810609a3ee1b33575c0
BLAKE2b-256 55bf967bc8441698c4aa28f8799e87543115b6d825985e56806a111acacbea9c

See more details on using hashes here.

File details

Details for the file web_search_tool-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for web_search_tool-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4436066999ad8faa64b045c42f3b82c75b9ce275ac95d92fa435990be8c73e96
MD5 9659debdaa7eb218e0d0e85c8bc6f7d3
BLAKE2b-256 3ec8678f1cf748305637b67e08525b03a4182573c08495e23b5e5cce0deb563c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page