LLM-powered web automation library with autonomous agents and natural language selectors
Project description
webtask
LLM-powered web automation library with autonomous agents and natural language selectors.
What it does
Three ways to use it:
High-level - Give it a task, let it figure out the steps Step-by-step - Execute tasks one step at a time for debugging/control Low-level - Tell it exactly what to do with natural language selectors
Uses multimodal LLMs (GPT-4 Vision, Gemini 2.5) to understand pages visually and through DOM. Sends screenshots with bounding boxes by default for better accuracy. Built with Playwright for the browser stuff.
Quick look
Setup:
from webtask import Webtask
from webtask.integrations.llm import GeminiLLM
# Create Webtask manager (browser launches lazily)
wt = Webtask()
# Choose your LLM (Gemini or OpenAI)
llm = GeminiLLM.create(model="gemini-2.5-flash")
# Create agent (screenshots with bounding boxes enabled by default)
agent = await wt.create_agent(llm=llm)
# Or disable screenshots for faster/cheaper operation
# agent = await wt.create_agent(llm=llm, use_screenshot=False)
High-level autonomous:
# Agent figures out the steps
result = await agent.execute("search for cats and click the first result")
print(f"Completed: {result.completed}")
Step-by-step execution:
# Execute task one step at a time
agent.set_task("add 2 items to cart")
for i in range(10):
step = await agent.run_step()
print(f"Step {i+1}: {len(step.proposal.actions)} actions")
print(f"Status: {step.proposal.message}")
if step.proposal.complete:
break
# Useful for debugging, progress tracking, or custom control flow
Low-level imperative:
# You control the steps, agent handles the selectors
await agent.navigate("https://google.com")
search_box = await agent.select("search box")
await search_box.fill("cats")
button = await agent.select("search button")
await button.click()
# Wait for page to stabilize
await agent.wait_for_idle()
# Take screenshot
await agent.screenshot("result.png")
No CSS selectors. No XPath. Just describe what you want.
How it works
High-level mode - The agent loop:
- Proposer looks at the page (text DOM + screenshot with bounding boxes) and task, proposes next actions AND checks if task is complete
- Executer runs the actions (navigate, click, fill, type)
- Repeat until task is complete
The agent sees both text (DOM tree with element IDs) and visual context (screenshot with labeled bounding boxes) for more accurate understanding.
Step-by-step mode - Same as high-level but you control the loop:
agent.set_task(description)- Set the taskagent.run_step()- Execute one step (propose → execute)- Setting a new task automatically resets history
Low-level mode - You call methods directly:
agent.navigate(url)- Go to a pageagent.select(description)- Find element by natural languageelement.click(),element.fill(text),element.type(text)- Interact with elementsagent.wait(seconds)- Wait for specific durationagent.wait_for_idle()- Wait for network/DOM to stabilizeagent.screenshot(path)- Capture page screenshot
All modes use the same core: LLM sees cleaned DOM with element IDs like button-0 instead of raw HTML, plus a screenshot with bounding boxes showing exactly where each element is. Clean input, clean output.
Available tools for autonomous mode:
navigate(url)- Navigate to a URLclick(element_id)- Click an elementfill(element_id, value)- Fill form field instantlytype(element_id, text)- Type text character-by-character with realistic delays
Status
🚧 Work in progress
Core implementation complete. See TODO for testing plan and future work.
Benchmarks
Evaluate webtask on standard web agent benchmarks:
webtask-benchmarks - Evaluation framework for Mind2Web and other benchmarks
Install
pip install pywebtask
playwright install chromium
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pywebtask-0.8.1.tar.gz.
File metadata
- Download URL: pywebtask-0.8.1.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0381622ca536e8c8b51c6c825f339f67b5dce56a172b875875c428ba98549a84
|
|
| MD5 |
08dfe569d9293db86f9c99fa34895dde
|
|
| BLAKE2b-256 |
d6b372f4ce07e35c118545101778e2fd2d26952efd0b01006076f97cb817afe6
|
Provenance
The following attestation bundles were made for pywebtask-0.8.1.tar.gz:
Publisher:
publish.yml on steve-z-wang/webtask
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pywebtask-0.8.1.tar.gz -
Subject digest:
0381622ca536e8c8b51c6c825f339f67b5dce56a172b875875c428ba98549a84 - Sigstore transparency entry: 662163618
- Sigstore integration time:
-
Permalink:
steve-z-wang/webtask@b9eada0399e6fe6e4ae07c87d6a07858632b52a8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/steve-z-wang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b9eada0399e6fe6e4ae07c87d6a07858632b52a8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pywebtask-0.8.1-py3-none-any.whl.
File metadata
- Download URL: pywebtask-0.8.1-py3-none-any.whl
- Upload date:
- Size: 67.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cee8c90dba2c57b159377018b50ff2e6321ff45349a801f95777e30c9274d59a
|
|
| MD5 |
ed65b37fc46b3b61b16ac61f71e3308e
|
|
| BLAKE2b-256 |
a11ad6817cb1a71c49cb127d00571660d08b523a608ae6d8f10dc3186b5d7e1e
|
Provenance
The following attestation bundles were made for pywebtask-0.8.1-py3-none-any.whl:
Publisher:
publish.yml on steve-z-wang/webtask
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pywebtask-0.8.1-py3-none-any.whl -
Subject digest:
cee8c90dba2c57b159377018b50ff2e6321ff45349a801f95777e30c9274d59a - Sigstore transparency entry: 662163622
- Sigstore integration time:
-
Permalink:
steve-z-wang/webtask@b9eada0399e6fe6e4ae07c87d6a07858632b52a8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/steve-z-wang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b9eada0399e6fe6e4ae07c87d6a07858632b52a8 -
Trigger Event:
push
-
Statement type: