AI-powered web browser automation
Project description
Allyson Python SDK
AI-powered web browser automation.
Installation
pip install allyson
After installation, you'll need to install the Playwright browsers:
python -m playwright install
Features
- Simple, intuitive API for browser automation
- AI-powered element selection and interaction
- Support for multiple browsers (Chromium, Firefox, WebKit)
- Asynchronous and synchronous interfaces
- Robust error handling and recovery
- DOM extraction and analysis for AI integration
- Screenshot annotation with element bounding boxes
- Agent loop for automating tasks with natural language
Quick Start
from allyson import Browser
# Create a browser instance
browser = Browser()
# Navigate to a website
browser.goto("https://example.com")
# Interact with the page
browser.click("Sign in")
browser.fill("Email", "user@example.com")
browser.fill("Password", "password")
browser.click("Submit")
# Take a screenshot
browser.screenshot("login.png")
# Close the browser
browser.close()
Advanced Usage
from allyson import Browser
async def run_automation():
# Use async API with context manager
async with Browser(headless=False) as browser:
await browser.goto("https://example.com")
# Wait for specific element
await browser.wait_for_selector(".content")
# Execute JavaScript
result = await browser.evaluate("document.title")
print(f"Page title: {result}")
# Multiple tabs/pages
new_page = await browser.new_page()
await new_page.goto("https://another-example.com")
# Run the async function
import asyncio
asyncio.run(run_automation())
DOM Extraction and Screenshot Annotation
from allyson import Browser, DOMExtractor
async def extract_and_annotate():
async with Browser(headless=False) as browser:
# Navigate to a website
await browser.goto("https://example.com")
# Create a DOM extractor
dom_extractor = DOMExtractor(browser._page)
# Extract interactive elements
elements = await dom_extractor.extract_interactive_elements()
print(f"Found {len(elements)} interactive elements")
# Take a screenshot with annotations
result = await dom_extractor.screenshot_with_annotations(
path="screenshot.png",
elements=elements,
show_element_ids=True,
box_color="red"
)
print(f"Clean screenshot: {result['clean']}")
print(f"Annotated screenshot: {result['annotated']}")
# Create an element map for AI analysis
map_result = await dom_extractor.screenshot_with_element_map(
path="element_map.png"
)
# The element map contains detailed information about each element
for element in map_result["elementMap"]:
print(f"Element #{element['id']}: {element['elementType']}")
# Run the async function
import asyncio
asyncio.run(extract_and_annotate())
Agent Loop for Task Automation
from allyson import Browser, Agent, AgentLoop, Tool, ToolType
async def automate_task():
# Create a browser instance
async with Browser(headless=False) as browser:
# Create an agent instance with your OpenAI API key
agent = Agent(api_key="your-api-key")
# Create a custom tool
weather_tool = Tool(
name="get_weather",
description="Get the current weather for a location",
type=ToolType.CUSTOM,
parameters_schema={
"location": {"type": "string", "description": "Location to get weather for"}
},
function=lambda location: {"temperature": 72, "condition": "Sunny"}
)
# Create an agent loop
agent_loop = AgentLoop(
browser=browser,
agent=agent,
tools=[weather_tool], # Optional custom tools
max_iterations=15,
screenshot_dir="screenshots",
verbose=True
)
# Run the agent loop with a natural language task
task = "Go to Google, search for 'Python programming language', and find information about it"
memory = await agent_loop.run(task)
# The memory contains the full conversation and actions taken
print("Task completed!")
# Run the async function
import asyncio
asyncio.run(automate_task())
Agent Loop Features
The agent loop provides several powerful features for automating web tasks:
-
Natural Language Instructions: Describe tasks in plain English, and the agent will figure out how to accomplish them.
-
Built-in Tools:
goto: Navigate to a URLclick: Click on an element by its ID numbertype: Type text into an element by its ID numberenter: Press the Enter key to submit formsscroll: Scroll the page in any directiondone: Mark the task as complete
-
Action Chaining: The agent can chain multiple actions together for efficiency:
# The agent can chain actions like typing and pressing Enter
{
"actions": [
{
"tool": "type",
"parameters": {
"element_id": 2,
"text": "search query"
}
},
{
"tool": "enter",
"parameters": {}
}
]
}
-
Custom Tools: Add your own tools to extend the agent's capabilities.
-
Memory and Context: The agent maintains a memory of all actions and observations, providing context for decision-making.
-
Error Handling: The agent can recover from errors and try alternative approaches.
-
Screenshot Annotations: Automatically take screenshots with annotated elements for better visibility.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Automated Publishing
This package uses GitHub Actions for automated testing and publishing to PyPI. The workflow is configured to:
- Run tests on every push to the main branch and on pull requests
- Build the package on every push to the main branch
- Publish to PyPI automatically when:
- A new tag is pushed with the format
v*(e.g., v0.1.0, v1.0.0) - A new GitHub Release is created
- A new tag is pushed with the format
To publish a new version:
- Update the version number in
setup.py - Commit and push your changes to the main branch
- Create and push a new tag:
git tag v0.1.1 git push origin v0.1.1
- The GitHub Action will automatically build and publish the package to PyPI
Note: You need to set up a PyPI API token as a GitHub secret named PYPI_API_TOKEN for the automated publishing to work.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
- 0.1.4 - Enhanced agent loop with action chaining, Enter key tool, and improved error handling
- 0.1.3 - Added DOM extraction and screenshot annotation features
- 0.1.2 - Updated Description
- 0.1.1 - Test release for GitHub Actions automated publishing
- 0.1.0 - Initial release
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file allyson-0.1.4.tar.gz.
File metadata
- Download URL: allyson-0.1.4.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af5105205bcb8fe3062992fb01d8f2ac89211ee3830c536dae381cde37e8f9be
|
|
| MD5 |
5708addac17d8f43269c79965be84b64
|
|
| BLAKE2b-256 |
da262080a39468b87aa02b97ba8210af6c0f506785db6e00a24bf3752247e9e2
|
File details
Details for the file allyson-0.1.4-py3-none-any.whl.
File metadata
- Download URL: allyson-0.1.4-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d507df251564086786327bbed2a64a82c33136025e249ad31d441672d754a6f1
|
|
| MD5 |
8af85048761d5cb0d3a1393521c0cd4c
|
|
| BLAKE2b-256 |
a2096fcafd1ea33946e4e4b3bd60ae0b0348fa02ea1dd339f2f442119129d66d
|