web-scout-ai

Agentic web research tool — smarter than search, faster than deep research. Search, scrape, and synthesize web content using LLMs.

These details have not been verified by PyPI

Project links

Repository

Project description

web-scout-ai logo

The missing middle ground between basic search APIs and heavyweight deep research agents.
One async call that finds the right URLs, reads real pages and documents, and returns a grounded synthesis with sources.

TL;DR

web-scout-ai is for teams that want better-than-snippets web research without the latency and cost profile of full deep-research stacks.

You get:

Search -> scrape -> evaluate -> iterate -> synthesize in one deterministic pipeline
Support for HTML, JS-rendered pages, PDFs, DOCX, PPTX, XLSX
Structured output that drops directly into agent workflows
Provider flexibility through LiteLLM (OpenAI, Anthropic, Gemini, Mistral, Groq, local, and more)

Why People Switch To web-scout-ai

Option	Typical output	Pain point
Search API only	snippets and links	not enough context to answer reliably
Single-page markdown tools	one page at a time	no discovery loop, no multi-source synthesis
Heavy deep-research agents	long reports	slower, more expensive, often overkill
`web-scout-ai`	sourced synthesis from real content	built for practical speed + depth balance

What Makes It Hook

1) It reads sources, not snippets

The pipeline extracts substantial query-relevant content from each source, then synthesizes across them.

2) It handles real documents out of the box

Static HTML via fast HTTP
JS pages via Playwright
PDF, DOCX, PPTX, XLSX via docling
Scanned PDFs via vision-model fallback

3) It closes coverage gaps automatically

If first-pass sources are incomplete, it checks the existing backlog first, then runs targeted follow-up searches only when needed.

4) It is agent-native by design

One async function (run_web_research), one typed result (WebResearchResult), zero framework lock-in.

Install In 30 Seconds

pip install web-scout-ai
web-scout-setup

web-scout-setup installs Chromium required for JS-rendered pages.

First Run

import asyncio
from web_scout import run_web_research

async def main():
    result = await run_web_research(
        query="What are the main threats to coral reefs worldwide?",
        models={
            "web_researcher": "gemini/gemini-2.0-flash",
            "content_extractor": "gemini/gemini-2.0-flash",
        },
    )

    print(result.synthesis)
    print("Sources:")
    for s in result.scraped:
        print(f"- {s.title}: {s.url}")

asyncio.run(main())

API At A Glance

result = await run_web_research(
    query="latest IPCC findings on sea level rise",
    models={
        "web_researcher": "openai/gpt-4o",
        "content_extractor": "gemini/gemini-2.0-flash",
    },
    search_backend="duckduckgo",       # or "serper"
    research_depth="standard",         # or "deep"
    include_domains=["ipcc.ch"],       # optional
    direct_url=None,                     # optional
    domain_expertise="climate science",# optional
)

Configuration

Models

Model ids follow LiteLLM provider naming:

models = {
    # Required
    "web_researcher": "openai/gpt-4o",
    "content_extractor": "gemini/gemini-2.0-flash",

    # Optional step-specific overrides (default: web_researcher)
    "query_generator": "anthropic/claude-sonnet-4-20250514",
    "coverage_evaluator": "openai/gpt-4o-mini",
    "synthesiser": "anthropic/claude-sonnet-4-20250514",

    # Optional fallback for scanned PDFs / empty JS pages
    "vision_fallback": "gemini/gemini-2.0-flash",
}

Environment variables

# Search backend (optional if using DuckDuckGo)
export SERPER_API_KEY="..."

# LLM providers (set what you use)
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GEMINI_API_KEY="..."
export MISTRAL_API_KEY="..."
export GROQ_API_KEY="..."

Research modes

# 1) Open web research (default)
await run_web_research(query="latest IPCC findings on sea level rise", models=models)

# 2) Domain-restricted
await run_web_research(
    query="endemic species conservation programs",
    models=models,
    include_domains=["iucn.org", "wwf.org"],
)

# 3) Direct URL extraction (skip search)
await run_web_research(
    query="key findings from this report",
    models=models,
    direct_url="https://example.org/biodiversity-report.pdf",
)

# 4) Direct URL list-page deepening
await run_web_research(
    query="sustainable land management technologies in Kenya",
    models=models,
    direct_url="https://wocat.net/en/database/list/?type=technology&country=ke",
)

Search backends

# Default: Serper (requires SERPER_API_KEY)
await run_web_research(query=..., models=..., search_backend="serper")

# Free: DuckDuckGo (no API key)
await run_web_research(query=..., models=..., search_backend="duckduckgo")

Research depth

# Standard (default): usually up to ~10 sources
await run_web_research(query=..., models=..., research_depth="standard")

# Deep: usually up to ~28 sources
await run_web_research(query=..., models=..., research_depth="deep")

Parameter	Standard	Deep
Max iterations	2	3
Search queries (first round)	3	5
Search queries (follow-up)	2	4
URLs scraped (first round)	6	12
URLs scraped (follow-up)	4	8

Pipeline Overview

Editable diagram: pipeline-diagram.excalidraw

Query
 |
 +- Generate search queries (LLM)
 +- Search web (Serper or DuckDuckGo)
 +- Select best URLs
 +- Scrape and extract in parallel
 |   +- Static HTML
 |   +- JS/SPA via Playwright
 |   +- PDF/DOCX/PPTX/XLSX via docling
 |   +- Scanned PDFs via vision fallback
 +- Evaluate coverage (LLM)
 |   +- Scrape promising backlog URLs
 |   +- Or generate targeted follow-up queries
 +- Synthesize findings (LLM)
 |
 +- WebResearchResult

Use As An Agent Tool

from agents import Agent, function_tool
from web_scout import run_web_research

@function_tool
async def research(query: str) -> str:
    result = await run_web_research(
        query=query,
        models={
            "web_researcher": "gemini/gemini-2.0-flash",
            "content_extractor": "gemini/gemini-2.0-flash",
        },
        search_backend="duckduckgo",
    )
    sources = "\n".join(f"- {s.url}" for s in result.scraped)
    return f"{result.synthesis}\n\nSources:\n{sources}"

agent = Agent(
    name="researcher",
    model="gpt-4o",
    tools=[research],
    instructions="Use the research tool to answer with up-to-date web sources.",
)

Output Schema

class WebResearchResult(BaseModel):
    synthesis: str
    scraped: list[UrlEntry]
    scrape_failed: list[UrlEntry]
    snippet_only: list[UrlEntry]
    queries: list[SearchQuery]

UrlEntry contains url, title, and content. SearchQuery contains query, num_results_returned, and domains_restricted.

Brand Assets

Full logo: assets/web-scout-logo.svg
Square logo mark (avatar-safe): assets/web-scout-logo-mark.svg
Social card preview: assets/web-scout-social-card.svg

Requirements

Python >=3.10
API key for at least one supported LLM provider
Optional SERPER_API_KEY (or use DuckDuckGo)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

1.1.0

Apr 23, 2026

1.0.5

Apr 16, 2026

1.0.3

Apr 14, 2026

0.9.4

Apr 10, 2026

0.9.2

Mar 27, 2026

This version

0.9.1

Mar 19, 2026

0.9.0

Mar 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web_scout_ai-0.9.1.tar.gz (35.8 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

web_scout_ai-0.9.1-py3-none-any.whl (37.8 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file web_scout_ai-0.9.1.tar.gz.

File metadata

Download URL: web_scout_ai-0.9.1.tar.gz
Upload date: Mar 19, 2026
Size: 35.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.14.2 Darwin/25.3.0

File hashes

Hashes for web_scout_ai-0.9.1.tar.gz
Algorithm	Hash digest
SHA256	`11132d5b30b069c9b2a43c5ddd1ea01b019a9b500fa5318d7c1680a922d0575e`
MD5	`5ec721a0bfca2dbbf2e2d93c8ee8750c`
BLAKE2b-256	`3a6f9e670f9aa61eb91b81926cdc5a73f73ac0e0d7c0bccaeb39013e5d6e35ed`

See more details on using hashes here.

File details

Details for the file web_scout_ai-0.9.1-py3-none-any.whl.

File metadata

Download URL: web_scout_ai-0.9.1-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 37.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.14.2 Darwin/25.3.0

File hashes

Hashes for web_scout_ai-0.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85f5be8c426aff8317372d7d2a6b7ad23a5a9e3aeb215c594470df111d56dc40`
MD5	`0952663e5095959a624f569fed606479`
BLAKE2b-256	`734adac88bebe3fdb64c47efde7c7f53c7fee99cb678edf2e7b5e7eaed716087`

See more details on using hashes here.

web-scout-ai 0.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TL;DR

Why People Switch To web-scout-ai

What Makes It Hook

1) It reads sources, not snippets

2) It handles real documents out of the box

3) It closes coverage gaps automatically

4) It is agent-native by design

Install In 30 Seconds

First Run

API At A Glance

Configuration

Models

Environment variables

Research modes

Search backends

Research depth

Pipeline Overview

Use As An Agent Tool

Output Schema

Brand Assets

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes