Skip to main content

AI-powered web browser automation

Project description

Allyson Python SDK

AI-powered web browser automation.

Installation

pip install allyson

After installation, you'll need to install the Playwright browsers:

python -m playwright install

Features

  • Simple, intuitive API for browser automation
  • AI-powered element selection and interaction
  • Support for multiple browsers (Chromium, Firefox, WebKit)
  • Asynchronous and synchronous interfaces
  • Robust error handling and recovery
  • DOM extraction and analysis for AI integration
  • Screenshot annotation with element bounding boxes
  • Agent loop for automating tasks with natural language

Quick Start

from allyson import Browser

# Create a browser instance
browser = Browser()

# Navigate to a website
browser.goto("https://example.com")

# Interact with the page
browser.click("Sign in")
browser.fill("Email", "user@example.com")
browser.fill("Password", "password")
browser.click("Submit")

# Take a screenshot
browser.screenshot("login.png")

# Close the browser
browser.close()

Advanced Usage

from allyson import Browser

async def run_automation():
    # Use async API with context manager
    async with Browser(headless=False) as browser:
        await browser.goto("https://example.com")
        
        # Wait for specific element
        await browser.wait_for_selector(".content")
        
        # Execute JavaScript
        result = await browser.evaluate("document.title")
        print(f"Page title: {result}")
        
        # Multiple tabs/pages
        new_page = await browser.new_page()
        await new_page.goto("https://another-example.com")

# Run the async function
import asyncio
asyncio.run(run_automation())

DOM Extraction and Screenshot Annotation

from allyson import Browser, DOMExtractor

async def extract_and_annotate():
    async with Browser(headless=False) as browser:
        # Navigate to a website
        await browser.goto("https://example.com")
        
        # Create a DOM extractor
        dom_extractor = DOMExtractor(browser._page)
        
        # Extract interactive elements
        elements = await dom_extractor.extract_interactive_elements()
        print(f"Found {len(elements)} interactive elements")
        
        # Take a screenshot with annotations
        result = await dom_extractor.screenshot_with_annotations(
            path="screenshot.png",
            elements=elements,
            show_element_ids=True,
            box_color="red"
        )
        
        print(f"Clean screenshot: {result['clean']}")
        print(f"Annotated screenshot: {result['annotated']}")
        
        # Create an element map for AI analysis
        map_result = await dom_extractor.screenshot_with_element_map(
            path="element_map.png"
        )
        
        # The element map contains detailed information about each element
        for element in map_result["elementMap"]:
            print(f"Element #{element['id']}: {element['elementType']}")

# Run the async function
import asyncio
asyncio.run(extract_and_annotate())

Agent Loop for Task Automation

from allyson import Browser, Agent, AgentLoop, Tool, ToolType

async def automate_task():
    # Create a browser instance
    async with Browser(headless=False) as browser:
        # Create an agent instance with your OpenAI API key
        agent = Agent(api_key="your-api-key")
        
        # Create a custom tool
        weather_tool = Tool(
            name="get_weather",
            description="Get the current weather for a location",
            type=ToolType.CUSTOM,
            parameters_schema={
                "location": {"type": "string", "description": "Location to get weather for"}
            },
            function=lambda location: {"temperature": 72, "condition": "Sunny"}
        )
        
        # Create an agent loop
        agent_loop = AgentLoop(
            browser=browser,
            agent=agent,
            tools=[weather_tool],  # Optional custom tools
            max_iterations=15,
            screenshot_dir="screenshots",
            verbose=True
        )
        
        # Run the agent loop with a natural language task
        task = "Go to Google, search for 'Python programming language', and find information about it"
        memory = await agent_loop.run(task)
        
        # The memory contains the full conversation and actions taken
        print("Task completed!")

# Run the async function
import asyncio
asyncio.run(automate_task())

Agent Loop Features

The agent loop provides several powerful features for automating web tasks:

  1. Natural Language Instructions: Describe tasks in plain English, and the agent will figure out how to accomplish them.

  2. Built-in Tools:

    • goto: Navigate to a URL
    • click: Click on an element by its ID number
    • type: Type text into an element by its ID number
    • enter: Press the Enter key to submit forms
    • scroll: Scroll the page in any direction
    • done: Mark the task as complete
  3. Action Chaining: The agent can chain multiple actions together for efficiency:

# The agent can chain actions like typing and pressing Enter
{
  "actions": [
    {
      "tool": "type",
      "parameters": {
        "element_id": 2,
        "text": "search query"
      }
    },
    {
      "tool": "enter",
      "parameters": {}
    }
  ]
}
  1. Custom Tools: Add your own tools to extend the agent's capabilities.

  2. Memory and Context: The agent maintains a memory of all actions and observations, providing context for decision-making.

  3. Error Handling: The agent can recover from errors and try alternative approaches.

  4. Screenshot Annotations: Automatically take screenshots with annotated elements for better visibility.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Automated Publishing

This package uses GitHub Actions for automated testing and publishing to PyPI. The workflow is configured to:

  1. Run tests on every push to the main branch and on pull requests
  2. Build the package on every push to the main branch
  3. Publish to PyPI automatically when:
    • A new tag is pushed with the format v* (e.g., v0.1.0, v1.0.0)
    • A new GitHub Release is created

To publish a new version:

  1. Update the version number in setup.py
  2. Commit and push your changes to the main branch
  3. Create and push a new tag:
    git tag v0.1.1
    git push origin v0.1.1
    
  4. The GitHub Action will automatically build and publish the package to PyPI

Note: You need to set up a PyPI API token as a GitHub secret named PYPI_API_TOKEN for the automated publishing to work.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

  • 0.1.4 - Enhanced agent loop with action chaining, Enter key tool, and improved error handling
  • 0.1.3 - Added DOM extraction and screenshot annotation features
  • 0.1.2 - Updated Description
  • 0.1.1 - Test release for GitHub Actions automated publishing
  • 0.1.0 - Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allyson-0.1.4.tar.gz (39.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

allyson-0.1.4-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file allyson-0.1.4.tar.gz.

File metadata

  • Download URL: allyson-0.1.4.tar.gz
  • Upload date:
  • Size: 39.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for allyson-0.1.4.tar.gz
Algorithm Hash digest
SHA256 af5105205bcb8fe3062992fb01d8f2ac89211ee3830c536dae381cde37e8f9be
MD5 5708addac17d8f43269c79965be84b64
BLAKE2b-256 da262080a39468b87aa02b97ba8210af6c0f506785db6e00a24bf3752247e9e2

See more details on using hashes here.

File details

Details for the file allyson-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: allyson-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for allyson-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d507df251564086786327bbed2a64a82c33136025e249ad31d441672d754a6f1
MD5 8af85048761d5cb0d3a1393521c0cd4c
BLAKE2b-256 a2096fcafd1ea33946e4e4b3bd60ae0b0348fa02ea1dd339f2f442119129d66d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page