Skip to main content

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Project description

WebQuest

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Scrapers

  • Any Article: Extracts readable content from arbitrary web articles.
  • DuckDuckGo Search: General web search using DuckDuckGo.
  • Google News Search: News-focused search via Google News.
  • YouTube Search: Search YouTube videos, channels, posts, and shorts.
  • YouTube Transcript: Fetch transcripts for YouTube videos.

Runners

  • Hyperbrowser: Executes scraping tasks using Hyperbrowser.

Installation

Installing using pip:

pip install webquest

Installing using uv:

uv add webquest

Usage

Example usage of the DuckDuckGo Search scraper:

import asyncio

from webquest.runners import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    runner = Hyperbrowser()
    scraper = DuckDuckGoSearch()

    response = await runner.run(
        scraper,
        scraper.Request(query="Pizza Toppings"),
    )
    print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

You can also run multiple requests at the same time:

import asyncio

from webquest.runners import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    runner = Hyperbrowser()
    scraper = DuckDuckGoSearch()

    responses = await runner.run_multiple(
        scraper,
        [
            scraper.Request(query="Pizza Toppings"),
            scraper.Request(query="AI News"),
        ],
    )
    for response in responses:
        print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

To use the Hyperbrowser runner, you need to set the HYPERBROWSER_API_KEY environment variable.

To use the Any Article scraper, you need to set the OPENAI_API_KEY environment variable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webquest-0.6.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webquest-0.6.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file webquest-0.6.0.tar.gz.

File metadata

  • Download URL: webquest-0.6.0.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.6.0.tar.gz
Algorithm Hash digest
SHA256 ddb39e8fec85d0021fff47729d21a13853c6ffc3a5b99cacd6832b1e15974c4a
MD5 378cb4210489ec48309625d18efc0be8
BLAKE2b-256 d8582712c5336526963afd3573faadafaefa7b1ad6d70c4b672ad15787fc25c9

See more details on using hashes here.

File details

Details for the file webquest-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: webquest-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31b5128f337b50685c9d9ee0a893dfbe1bbb02f811c528527f1ea2c752abc12e
MD5 ea02c2a6f26b454b8c2633bbe6d985de
BLAKE2b-256 b47c4ecda778d262c5197e2d47bded5e5791012b5b41dcad58cba024bff55de5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page