Skip to main content

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Project description

WebQuest

WebQuest is an extensible Python toolkit for high-level web scraping, built around a generic Playwright-based scraper interface for quickly building, running, and reusing custom scrapers.

Scrapers

  • Any Article: Extracts readable content from arbitrary web articles.
  • DuckDuckGo Search: General web search using DuckDuckGo.
  • Google News Search: News-focused search via Google News.
  • YouTube Search: Search YouTube videos, channels, posts, and shorts.
  • YouTube Transcript: Fetch transcripts for YouTube videos.

Runners

  • Hyperbrowser: Executes scraping tasks using Hyperbrowser.

Installation

Installing using pip:

pip install webquest

Installing using uv:

uv add webquest

Usage

Example usage of the DuckDuckGo Search scraper:

import asyncio

from webquest.runners import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    runner = Hyperbrowser()
    scraper = DuckDuckGoSearch()

    response = await runner.run(
        scraper,
        scraper.Request(query="Pizza Toppings"),
    )
    print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

You can also run multiple requests at the same time:

import asyncio

from webquest.runners import Hyperbrowser
from webquest.scrapers import DuckDuckGoSearch


async def main() -> None:
    runner = Hyperbrowser()
    scraper = DuckDuckGoSearch()

    responses = await runner.run_multiple(
        scraper,
        [
            scraper.Request(query="Pizza Toppings"),
            scraper.Request(query="AI News"),
        ],
    )
    for response in responses:
        print(response.model_dump_json(indent=4))


if __name__ == "__main__":
    asyncio.run(main())

To use the Hyperbrowser runner, you need to set the HYPERBROWSER_API_KEY environment variable.

To use the Any Article scraper, you need to set the OPENAI_API_KEY environment variable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webquest-0.6.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webquest-0.6.1-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file webquest-0.6.1.tar.gz.

File metadata

  • Download URL: webquest-0.6.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.6.1.tar.gz
Algorithm Hash digest
SHA256 9840242dd2de7b0acdc2e5a6a2a8c98846705c93503f160c055d9020ee2e4358
MD5 83597c833cfeb7525fef5931e0a617fe
BLAKE2b-256 d429c0af8f370ba85648f80af5c1688aa5f76684806aef8e6b2c6c6c5993ece3

See more details on using hashes here.

File details

Details for the file webquest-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: webquest-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.1

File hashes

Hashes for webquest-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 018983a1571c8fcdc484a64d9596a945c9e5fae2e054c147a83b6e7b08f6f932
MD5 19162f103e9a0b025aa88fa60618e5c4
BLAKE2b-256 478add5ff0b1b59480e9f160f8f609ba47a615288688ef76c5a61669ba14e09e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page