Skip to main content

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Project description

WebQuest

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Scrapers

  • Any Article: Extracts readable content from arbitrary web articles.
  • DuckDuckGo Search: General web search using DuckDuckGo.
  • Google News Search: News-focused search via Google News.
  • YouTube Search: Search YouTube videos, channels, posts, and shorts.
  • YouTube Transcript: Fetch transcripts for YouTube videos.

Browsers

  • Hyperbrowser: A cloud-based browser service for running Playwright scrapers without managing infrastructure.

Installation

Installing using pip:

pip install webquest

Installing using uv:

uv add webquest

Usage

Example usage of the DuckDuckGo Search scraper:

import asyncio

from webquest.browsers import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    scraper = DuckDuckGoSearch(browser=Hyperbrowser())

    response = await scraper.run(
        scraper.request(query="Pizza Toppings"),
    )
    print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

You can also run multiple requests at the same time:

import asyncio

from webquest.browsers import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    scraper = DuckDuckGoSearch(browser=Hyperbrowser())

    responses = await scraper.run(
        scraper.request(query="Pizza Toppings"),
        scraper.request(query="AI News"),
    )
    for response in responses:
        print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

To use the Hyperbrowser browser, you need to set the HYPERBROWSER_API_KEY environment variable.

To use the Any Article scraper, you need to set the OPENAI_API_KEY environment variable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webquest-0.7.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webquest-0.7.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file webquest-0.7.0.tar.gz.

File metadata

  • Download URL: webquest-0.7.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.7.0.tar.gz
Algorithm Hash digest
SHA256 69e7bb10995b7c6a39dd5817d94ff0f42fdc73e8beda2edd9ee0e905137f3b7e
MD5 75e993c137029af8a57891424fec9bed
BLAKE2b-256 ec4428dc5a65bc6df8ed01ff38be8e45892fa8d1f877e0e30a7d0b2f7decfb39

See more details on using hashes here.

File details

Details for the file webquest-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: webquest-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b553c62720f62cd21452afda15e52b838c9a640ada52e0171c9e0f85cf03b597
MD5 40ebc14b04aae4e53e212c3c201d6bab
BLAKE2b-256 a90f0d03ba4be39dc9b6431925c714ff5f39f77b41cb9ba7f7b9e71a173f0f22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page