Skip to main content

LLM-powered web automation library with autonomous agents and natural language selectors

Project description

webtask

PyPI version Tests License: MIT

LLM-powered web automation library with autonomous agents and natural language selectors.

📚 Documentation | 🐍 PyPI | 📊 Benchmarks


What it does

Three ways to use it:

High-level - Give it a task, let it figure out the steps Step-by-step - Execute tasks one step at a time for debugging/control Low-level - Tell it exactly what to do with natural language selectors

Uses multimodal LLMs (GPT-4 Vision, Gemini 2.5) to understand pages visually and through DOM. Sends screenshots with bounding boxes by default for better accuracy. Built with Playwright for the browser stuff.


Quick look

Setup:

from webtask import Webtask
from webtask.integrations.llm import GeminiLLM

# Create Webtask manager (browser launches lazily)
wt = Webtask()

# Choose your LLM (Gemini or OpenAI)
llm = GeminiLLM.create(model="gemini-2.5-flash")

# Create agent (screenshots with bounding boxes enabled by default)
agent = await wt.create_agent(llm=llm)

# Or disable screenshots for faster/cheaper operation
# agent = await wt.create_agent(llm=llm, use_screenshot=False)

High-level autonomous:

# Agent figures out the steps
result = await agent.execute("search for cats and click the first result")
print(f"Completed: {result.completed}")

Step-by-step execution:

# Execute task one step at a time
agent.set_task("add 2 items to cart")

for i in range(10):
    step = await agent.run_step()

    print(f"Step {i+1}: {len(step.proposal.actions)} actions")
    print(f"Status: {step.proposal.message}")

    if step.proposal.complete:
        break

# Useful for debugging, progress tracking, or custom control flow

Low-level imperative:

# You control the steps, agent handles the selectors
await agent.navigate("https://google.com")

search_box = await agent.select("search box")
await search_box.fill("cats")

button = await agent.select("search button")
await button.click()

# Wait for page to stabilize
await agent.wait_for_idle()

# Take screenshot
await agent.screenshot("result.png")

No CSS selectors. No XPath. Just describe what you want.


How it works

High-level mode - The agent loop:

  1. Proposer looks at the page (text DOM + screenshot with bounding boxes) and task, proposes next actions AND checks if task is complete
  2. Executer runs the actions (navigate, click, fill, type)
  3. Repeat until task is complete

The agent sees both text (DOM tree with element IDs) and visual context (screenshot with labeled bounding boxes) for more accurate understanding.

Step-by-step mode - Same as high-level but you control the loop:

  • agent.set_task(description) - Set the task
  • agent.run_step() - Execute one step (propose → execute)
  • Setting a new task automatically resets history

Low-level mode - You call methods directly:

  • agent.navigate(url) - Go to a page
  • agent.select(description) - Find element by natural language
  • element.click(), element.fill(text), element.type(text) - Interact with elements
  • agent.wait(seconds) - Wait for specific duration
  • agent.wait_for_idle() - Wait for network/DOM to stabilize
  • agent.screenshot(path) - Capture page screenshot

All modes use the same core: LLM sees cleaned DOM representation plus screenshots with bounding boxes for accurate understanding. No CSS selectors, no XPath - just natural language.


Installation

pip install pywebtask
playwright install chromium

Set up your API key:

export GEMINI_API_KEY="your-api-key"  # or OPENAI_API_KEY

Documentation

📚 Full Documentation


Benchmarks

Evaluate webtask on standard web agent benchmarks:

webtask-benchmarks - Evaluation framework for Mind2Web and other benchmarks


Contributing

See TODO.md for planned features and improvements.

Contributions welcome! Open an issue or submit a PR.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebtask-0.11.0.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pywebtask-0.11.0-py3-none-any.whl (84.3 kB view details)

Uploaded Python 3

File details

Details for the file pywebtask-0.11.0.tar.gz.

File metadata

  • Download URL: pywebtask-0.11.0.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pywebtask-0.11.0.tar.gz
Algorithm Hash digest
SHA256 acf31c42d6f82804f7101fe2fec430fd31bae2d9f070e9a70066d82ec8894199
MD5 93798198ee59d4e0a0ccbbd6e653c59e
BLAKE2b-256 86381b0415719e02aa8d46feb95131e64fdb3d088c47559f18dd22d711c64eae

See more details on using hashes here.

Provenance

The following attestation bundles were made for pywebtask-0.11.0.tar.gz:

Publisher: publish.yml on steve-z-wang/webtask

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pywebtask-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: pywebtask-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 84.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pywebtask-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 42517ea27d8b5cd9bc9494274b69258176e18d287ea2cbe3b0e98fe2de272822
MD5 56f4fae0629f0a061cc16eb0738212e2
BLAKE2b-256 d3181e52ba94fd0f1ce2e62fb954a56b597bebc6796bf31de8d6c11c4e50585c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pywebtask-0.11.0-py3-none-any.whl:

Publisher: publish.yml on steve-z-wang/webtask

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page