Unified CLI over Tavily, Firecrawl, and SerpAPI web-search/content SDKs
Project description
web-search-tool
A unified Typer CLI over the Tavily, Firecrawl, and SerpAPI web-search/content SDKs — one command surface and one normalized output schema for humans, scripts, and agents.
- One schema, every provider.
searchand every vertical return the same envelope shape (title,url,snippet,score) plus documented vertical-specific fields, regardless of which provider served the request. - A resilient fetch cascade.
fetchwalksmarkdownify → tavily → firecrawl, falling through on thin content, hard HTTP errors, and unsupported sites. - Agent-friendly. Deterministic exit codes, JSON when piped, an explicit escalation signal on total fetch failure, and byte-bounded/chunked output.
Install
cd web-search-tool
uv sync --extra dev
uv run web-search-tool --version
Configuration
Settings resolve with the precedence flag > env > config file > default.
- Tool settings use the
SEARCH_TOOL_env prefix (e.g.SEARCH_TOOL_DEFAULT_PROVIDER=tavily,SEARCH_TOOL_TIMEOUT=30). - Provider keys use their conventional names:
TAVILY_API_KEY,FIRECRAWL_API_KEY,SERPAPI_API_KEY. - Set a key to
SKIPto opt a provider out of launch validation. Requesting a vertical that only that provider serves (e.g.scholar/jobswithSERPAPI_API_KEY=SKIP) fails with a specific SKIP-conflict error. - An optional TOML config lives at
~/.config/web-search-tool/config.toml(override withWEB_SEARCH_TOOL_CONFIGor the--configflag). Seeconfig.example.tomlfor every tunable.
# Minimal: keys in the environment
export SERPAPI_API_KEY=... # required for the SerpAPI verticals
export TAVILY_API_KEY=... # search, research, one cascade converter
export FIRECRAWL_API_KEY=SKIP # opt out if you don't have a Firecrawl key
| Setting | Env var | Default | Purpose |
|---|---|---|---|
| Default provider | SEARCH_TOOL_DEFAULT_PROVIDER |
serpapi |
Provider for generic search/resolve |
| Timeout | SEARCH_TOOL_TIMEOUT |
20 |
Per-request timeout (seconds) |
| Default format | SEARCH_TOOL_DEFAULT_FORMAT |
(auto) | table or json; auto-detects TTY when unset |
| Tavily key | TAVILY_API_KEY |
— | Tavily search + deep research |
| Firecrawl key | FIRECRAWL_API_KEY |
— | Firecrawl search + scrape converter |
| SerpAPI key | SERPAPI_API_KEY |
— | All seven SerpAPI verticals |
| Sentry DSN | SENTRY_DSN |
(off) | Enables error tracking + tracing when set |
| Sentry environment | SENTRY_ENVIRONMENT |
local |
Deployment tag on Sentry events |
| Trace sample rate | SENTRY_TRACES_SAMPLE_RATE |
1.0 |
Transaction sampling, 0.0–1.0 |
The Sentry settings also live under a [monitoring] table in the config file and
follow the same precedence. See Monitoring.
Output contract
-
stdout carries only the payload. All logs go to stderr; API keys never appear in logs, output, or written files.
-
Format: a TTY renders a rich table; a pipe emits JSON.
--format/-f table|jsonoverrides both;--compactremoves JSON indentation. -
Every payload is wrapped in an envelope —
{command, ok, data}— so callers can branch onokwithout parsing the body. -
Exit codes are deterministic and category-specific:
Code Name Meaning 0 SUCCESS Completed 1 GENERAL Unclassified error 2 USAGE Bad CLI usage 3 INPUT Bad input / no URL resolved 4 NOT_FOUND No results 5 NETWORK Network/provider failure 6 TIMEOUT Bounded wait exceeded 7 CONFIG Missing/SKIP key, bad config
Commands
Search & verticals
web-search-tool search "python typer cli" --limit 10 [--provider tavily|firecrawl|serpapi]
web-search-tool news "us treasury yields" -n 5
web-search-tool jobs "platform engineer" --location "Boston, MA"
web-search-tool images "golden gate bridge"
web-search-tool videos "rust async tutorial"
web-search-tool reverse-image "https://example.com/photo.jpg"
web-search-tool scholar "continual learning llm" \
--author "Bengio" --min-cites 50 --since 2020 --until 2024 --sort cites
All search commands accept --limit/-n, --format/-f, and --compact.
--limit is enforced uniformly — SerpAPI verticals are capped client-side.
Scholar filters (scholar only):
| Flag | Effect |
|---|---|
--author/-a NAME |
Native author:"NAME" operator |
--min-cites N |
Keep only results cited ≥ N times (client-side over a pool) |
--since YEAR / --until YEAR |
Native publication-year range |
--sort relevance|date|cites |
relevance (default), most-recent, or most-cited |
Fetch — URL → markdown via the cascade
web-search-tool fetch "https://example.com/article" # probe: first success wins
web-search-tool fetch URL --compare # all converters, labeled
web-search-tool fetch URL --raw-html # local markdownify only
web-search-tool fetch URL --order markdownify,firecrawl # custom chain
web-search-tool fetch URL --max-bytes 20000 --offset 0 # bounded / chunked window
web-search-tool fetch URL --stdout # stream instead of writing a file
- Default mode is a probe: walk the chain, first success wins; total failure
raises chain-exhaustion carrying the exact signal
Fetch chain completely failed, try using agent-browser. --max-bytesoverflow truncates and appends the marker*OUTPUT TRUNCATED PLS INCREASE THE CAP OR DO CHUNKED REQUEST*; combine with--offsetfor deterministic chunked continuation.- Without
--stdout, output is written to a collision-safe slug file (<UTC-timestamp>-<slug>.md) in--out-dir(default: current directory).
Resolve — search, then fetch the top-N URLs
web-search-tool resolve "best rust web framework" --limit 5
web-search-tool resolve "fed rate decision" --vertical news -n 3
web-search-tool resolve "q" --stdout
Runs a search over --vertical (default search), then fetches each result URL
through the cascade in auto-resolve mode: a URL that exhausts its chain
records an escalation but never aborts the batch. Exits 0 if any URL
resolved, 3 (INPUT) only if all failed.
Research — deep cited research via Tavily
web-search-tool research run "history of CRISPR patents" # returns request_id
web-search-tool research run "q" --file notes.md --file data.json # attach local context
web-search-tool research run "q" --wait --max-wait 300 # poll to completion
web-search-tool research status <request_id> # look up once
research run returns a request_id immediately; --wait polls until a
terminal status or --max-wait (then exits TIMEOUT with a resume hint).
research status reports status and, on completion, the cited markdown report
with numbered sources. --file attaches up to 5 local .txt/.md/.json
files as research context.
Monitoring
Sentry is off by default and fully optional — set a DSN to turn it on. When enabled, the tool reports errors and wraps each invocation in a transaction with child spans around every outbound call (provider search, each fetch-cascade tier, research launch/poll), tagged with non-secret data (provider, tier, status, result counts). API key values are never attached to spans, tags, or events.
Configure it via env or the [monitoring] table in the config file (same
precedence as everything else):
export SENTRY_DSN="https://<key>@<org>.ingest.sentry.io/<project>"
export SENTRY_ENVIRONMENT=prod # optional, defaults to "local"
export SENTRY_TRACES_SAMPLE_RATE=0.2 # optional, defaults to 1.0
Verify the setup end to end — emits a span tree plus one synthetic, harmless exception, then prints the captured event id:
web-search-tool test sentry
It exits with the config code (7) when no DSN is configured (nothing to test).
Development
uv run pytest # unit tests (live tests skipped without keys)
uv run pytest -m live # opt-in tests against real provider APIs
uv run ruff check . && uv run ruff format --check .
uv run mypy --strict src/web_search_tool
uv run pytest --cov=web_search_tool --cov-report=term-missing
See CONTRIBUTING.md for the full workflow.
License
MIT © Vlad Korolev
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file web_search_tool-0.1.0.tar.gz.
File metadata
- Download URL: web_search_tool-0.1.0.tar.gz
- Upload date:
- Size: 172.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7c19dd8eff9f9c4432f17d5a368ddcb07b2c216458c1d231e8a82ccbc4890f4
|
|
| MD5 |
b263b85d316b7810609a3ee1b33575c0
|
|
| BLAKE2b-256 |
55bf967bc8441698c4aa28f8799e87543115b6d825985e56806a111acacbea9c
|
File details
Details for the file web_search_tool-0.1.0-py3-none-any.whl.
File metadata
- Download URL: web_search_tool-0.1.0-py3-none-any.whl
- Upload date:
- Size: 51.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4436066999ad8faa64b045c42f3b82c75b9ce275ac95d92fa435990be8c73e96
|
|
| MD5 |
9659debdaa7eb218e0d0e85c8bc6f7d3
|
|
| BLAKE2b-256 |
3ec8678f1cf748305637b67e08525b03a4182573c08495e23b5e5cce0deb563c
|