Skip to main content

AI-powered web browser automation

Project description

Allyson Python SDK

AI-powered web browser automation.

Installation

pip install allyson

After installation, you'll need to install the Playwright browsers:

python -m playwright install

Features

  • Simple, intuitive API for browser automation
  • AI-powered element selection and interaction
  • Support for multiple browsers (Chromium, Firefox, WebKit)
  • Asynchronous and synchronous interfaces
  • Robust error handling and recovery
  • DOM extraction and analysis for AI integration
  • Screenshot annotation with element bounding boxes
  • Agent loop for automating tasks with natural language

Quick Start

from allyson import Browser, Agent, AgentLoop, Tool, ToolType

async def automate_task():
    # Create a browser instance
    async with Browser(
        headless=False,
        # Optional: Use your own Chrome installation instead of the default Chromium
        executable_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
    ) as browser:
        # Create an agent instance with your OpenAI API key
        agent = Agent(api_key="your-api-key")
        
        # Create a custom tool
        weather_tool = Tool(
            name="get_weather",
            description="Get the current weather for a location",
            type=ToolType.CUSTOM,
            parameters_schema={
                "location": {"type": "string", "description": "Location to get weather for"}
            },
            function=lambda location: {"temperature": 72, "condition": "Sunny"}
        )
        
        # Create an agent loop
        agent_loop = AgentLoop(
            browser=browser,
            agent=agent,
            tools=[weather_tool],  # Optional custom tools
            max_steps=15,
            screenshot_dir="screenshots",
            plan_dir="plans",      # Directory to save task plans
            verbose=True
        )
        
        # Run the agent loop with a natural language task
        task = "Go to Google, search for 'Python programming language', and find information about it"
        memory = await agent_loop.run(task)
        
        # The memory contains the full conversation and actions taken
        print("Task completed!")
        
        # Print the final plan with completed steps
        if agent_loop.state.plan_path:
            with open(agent_loop.state.plan_path, "r") as f:
                print(f.read())

# Run the async function
import asyncio
asyncio.run(automate_task())

Using Your Own Chrome Installation

By default, Allyson uses the Playwright-managed Chromium browser. However, for better stability and compatibility, you can use your own Chrome installation:

# Windows
browser = Browser(executable_path="C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe")

# macOS
browser = Browser(executable_path="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome")

# Linux
browser = Browser(executable_path="/usr/bin/google-chrome")

This is especially useful for automation tasks that require specific browser versions or configurations.

Agent Loop Features

The agent loop provides several powerful features for automating web tasks:

  1. Natural Language Instructions: Describe tasks in plain English, and the agent will figure out how to accomplish them.

  2. Task Planning: The agent automatically creates a step-by-step plan for completing the task and tracks progress by marking steps as completed.

  3. Built-in Tools:

    • goto: Navigate to a URL
    • click: Click on an element by its ID number
    • type: Type text into an element by its ID number
    • enter: Press the Enter key to submit forms
    • scroll: Scroll the page in any direction
    • done: Mark the task as complete
  4. Action Chaining: The agent can chain multiple actions together for efficiency:

# The agent can chain actions like typing and pressing Enter
{
  "actions": [
    {
      "tool": "type",
      "parameters": {
        "element_id": 2,
        "text": "search query"
      }
    },
    {
      "tool": "enter",
      "parameters": {}
    }
  ]
}
  1. Custom Tools: Add your own tools to extend the agent's capabilities.

  2. Memory and Context: The agent maintains a memory of all actions and observations, providing context for decision-making.

  3. Error Handling: The agent can recover from errors and try alternative approaches.

  4. Screenshot Annotations: Automatically take screenshots with annotated elements for better visibility.

Example Plan

The agent creates a Markdown plan like this for each task:

# Plan for: Search for information about Python programming language

## Steps:
- [x] Navigate to a search engine
- [x] Search for "Python programming language"
- [ ] Review search results
  - [ ] Identify official Python website
  - [ ] Identify Wikipedia page
- [ ] Visit the most relevant page
- [ ] Extract key information
  - [ ] What is Python
  - [ ] Key features
  - [ ] Current version
- [ ] Summarize findings

As the agent completes steps, it automatically updates the plan by marking steps as completed with checkboxes.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

  • 0.1.6 - Added support for custom Chrome browser path
  • 0.1.5 - Added planner feature for creating and tracking task progress
  • 0.1.4 - Enhanced agent loop with action chaining, Enter key tool, and improved error handling
  • 0.1.3 - Added DOM extraction and screenshot annotation features
  • 0.1.2 - Updated Description
  • 0.1.1 - Test release for GitHub Actions automated publishing
  • 0.1.0 - Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allyson-0.1.6.tar.gz (41.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

allyson-0.1.6-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file allyson-0.1.6.tar.gz.

File metadata

  • Download URL: allyson-0.1.6.tar.gz
  • Upload date:
  • Size: 41.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for allyson-0.1.6.tar.gz
Algorithm Hash digest
SHA256 ccceac0b145baffc22cd9f414add410886352d538e162f92404f03d563e0c56d
MD5 a90b65e2f94b8544307acde4e73701f9
BLAKE2b-256 7fe008614abc6d7e38acc109c8c800c106d5b16197203b7cc58cf5bf848679e8

See more details on using hashes here.

File details

Details for the file allyson-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: allyson-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for allyson-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 6ecbdd7ffd5e3f8c5f849facf2fa74e775e6b056c5249e9947a1d52b985c466c
MD5 55653ee50597908edf552669cdedb2a7
BLAKE2b-256 d18bfb10e804215ceaad152e251dc0b5acab2b7351fb52f8c6e3e887060b693e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page