Skip to main content

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Project description

WebQuest

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Scrapers

  • Any Article: Extracts readable content from arbitrary web articles.
  • DuckDuckGo Search: General web search using DuckDuckGo.
  • Google News Search: News-focused search via Google News.
  • YouTube Search: Search YouTube videos, channels, posts, and shorts.
  • YouTube Transcript: Fetch transcripts for YouTube videos.

Runners

  • Hyperbrowser: Executes scraping tasks using Hyperbrowser.

Installation

Installing using pip:

pip install webquest

Installing using uv:

uv add webquest

Usage

Example usage of the DuckDuckGo Search scraper:

import asyncio

from webquest.runners import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    runner = Hyperbrowser()
    scraper = DuckDuckGoSearch()
    response = await runner.run(
        scraper,
        scraper.Request(query="Pizza Toppings"),
    )
    print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

You can also run multiple requests at the same time:

import asyncio

from webquest.runners import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    runner = Hyperbrowser()
    scraper = DuckDuckGoSearch()
    responses = await runner.run_multiple(
        scraper,
        [
            scraper.Request(query="Pizza Toppings"),
            scraper.Request(query="AI News"),
        ],
    )
    for response in responses:
        print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

To use the Hyperbrowser runner, you need to set the HYPERBROWSER_API_KEY environment variable.

To use the Any Article scraper, you need to set the OPENAI_API_KEY environment variable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webquest-0.4.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webquest-0.4.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file webquest-0.4.1.tar.gz.

File metadata

  • Download URL: webquest-0.4.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.4.1.tar.gz
Algorithm Hash digest
SHA256 3cc03dbe3fadde4210f1ca14b01878f5d0b88f34f99568bae67d786d9b9870ad
MD5 c86ddaf25ea22e1b7bf1b2f1e9e85e05
BLAKE2b-256 6fb43bfce8fd6e67fac0084278d03f30faf09fd032979b667824246c04759500

See more details on using hashes here.

File details

Details for the file webquest-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: webquest-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b7a66d68b4cc1f778220fe061fd3c8ee24f264be994e6aa965b5d24553c6f718
MD5 00accd5eb3b7ac996194670457483651
BLAKE2b-256 20293ff0854730bfe6786d5071e43b8201e89abbdd247d0244d533328287c296

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page