Skip to main content

A comprehensive web automation and DOM parsing platform with AI-powered agents

Project description

CeSail

๐Ÿš€ The AI-native web automation engine โ€“ parse, understand, and act on any website with agent-friendly data.


๐ŸŒŸ Why CeSail?

Traditional automation tools like Playwright and Selenium are great at clicking buttons โ€” but terrible at helping AI agents understand the meaning of web pages.

CeSail changes that:

  • It parses complex DOMs into clean, structured, JSON-like data.
  • It maps semantics (e.g., โ€œthis is a login buttonโ€, โ€œthis is a price fieldโ€).
  • It executes actions (click, type, scroll, navigate) with full end-to-end automation.
  • It provides visual feedback with overlays for debugging and demos.

Think of it as the missing bridge between the messy web and intelligent AI workflows.


๐ŸŽฅ Demo

Hereโ€™s CeSail + Cursor MCP in action, searching flights on Expedia:

CeSail Demo - Flight Search on Expedia


๐Ÿ”‘ Features

  1. ๐ŸŒ Web Page Analysis โ€“ Extracts DOM elements, forms, buttons, and interactive components
  2. ๐Ÿง  Agent-Friendly Parsing โ€“ Converts raw HTML into structured, semantic data
  3. ๐ŸŽฏ Actionable Intelligence โ€“ Identifies clickable items, input fields, and navigation paths
  4. ๐Ÿ“Š Structured Output โ€“ JSON-like objects that AI can instantly consume
  5. ๐Ÿ” Context Preservation โ€“ Maintains relationships between elements and their purposes
  6. ๐Ÿ“ธ Visual Overlays โ€“ Screenshots with highlighted action items

Quick Start

Install from PyPI (Recommended)

The easiest way to get started with CeSail is to install it from PyPI:

# Install CeSail
pip install cesail

# Install Playwright browsers
playwright install

Simple Example

Here's a quick example that demonstrates CeSail's core functionality:

import asyncio
from cesail import DOMParser, Action, ActionType

async def quick_demo():
    """Quick demonstration of CeSail's web automation capabilities."""
    async with DOMParser(headless=False) as parser:
        # Navigate to a website
        action = Action(
            type=ActionType.NAVIGATE,
            metadata={"url": "https://www.example.com"}
        )
        await parser._action_executor.execute_action(action)
        
        # Analyze the page and get structured data
        parsed_page = await parser.analyze_page()
        print(f"Found {len(parsed_page.important_elements.elements)} interactive elements")
        
        # Take a screenshot with overlays
        await parser.take_screenshot("demo_screenshot.png")
        
        # Show available actions
        print("Available actions:")
        for element in parsed_page.important_elements.elements[:3]:
            print(f"  - {element.type}: {element.text}")

# Run the demo
asyncio.run(quick_demo())

MCP (Model Context Protocol) Integration

CeSail provides a FastMCP server that enables AI assistants like Cursor to directly interact with web pages through standardized APIs. This allows you to give natural language commands to your AI assistant and have it execute web automation tasks.

Setting up MCP with Cursor

  1. Install CeSail MCP Server:

    pip install cesail fastmcp
    playwright install
    
  2. Configure MCP Settings:

    • Open Cursor
    • Go to Settings โ†’ Extensions โ†’ MCP
    • Add a new server configuration:
    • Note: Make sure to use the path to your Python executable. You can find it by running which python or which python3 in your terminal.
    {
      "mcpServers": {
        "cesail": {
          "command": "python3",
          "args": ["-m", "cesail.cesail_mcp.fastmcp_server"],
          "env": {
            "PYTHONUNBUFFERED": "1"
          },
          "description": "CeSail MCP Server for comprehensive web automation and DOM parsing",
          "capabilities": {
            "tools": {
              "listChanged": true
            }
          }
        }
      }
    }
    

    Note: This configuration has been tested with Cursor. For best performance, users should disable the get_screenshot capability as Cursor screenshots can take a while to process. To disable it, go to Cursor Settings โ†’ Tools & Integrations โ†’ MCP and disable the get_screenshot capability for the CeSail server. This should also work with other MCP-compatible agents, though it hasn't been tested with them.

    For more help setting up Cursor MCP, see: https://docs.cursor.com/en/context/mcp

  3. Test the FastMCP Server:

    python3 -m cesail.cesail_mcp.fastmcp_server
    

    Run this command to ensure the server launches properly. You should see output indicating the server is starting up.

  4. Use in Cursor: Now you can ask Cursor to perform web automation tasks:

    "Using cesail MCP, Navigate to example.com and do a certain task"
    "Using cesail MCP, ..."
    

Running the Simple Agent

The Simple Agent is an AI-powered web automation agent that uses Claude (Anthropic) to understand and execute web tasks. Here's how to set it up and run it:

1. Set up your Anthropic API Key

You'll need an Anthropic API key to use the Simple Agent. Get one from Anthropic's console.

2. Configure the environment

Set the Anthropic API key as an environment variable in your current shell session:

# Export the API key for your current session
export ANTHROPIC_API_KEY="your_anthropic_api_key_here"

Important: Replace your_anthropic_api_key_here with your actual Anthropic API key.

Alternative: If you want to persist the API key, you can add it to your shell profile:

# Add to ~/.bashrc, ~/.zshrc, or ~/.bash_profile
echo 'export ANTHROPIC_API_KEY="your_anthropic_api_key_here"' >> ~/.zshrc
source ~/.zshrc

3. Run the Simple Agent

From the project root directory:

# Activate virtual environment
source venv/bin/activate

# Run the simple agent
python -m cesail.simple_agent.simple_agent

4. Using the Simple Agent

Once running, the agent will:

  1. Ask for a URL to navigate to (or use the default Google Flights)
  2. Present an interactive prompt where you can describe tasks
  3. Use Claude to break down your request into actionable steps
  4. Execute web automation tasks with visual feedback

Example usage:

What would you like me to do? (e.g., 'Find and click on the men's shoes section'): 
Search for flights from New York to London from Oct 27th to Nov 24th on Expedia.

5. Features

  • Natural Language Processing: Describe tasks in plain English
  • AI-Powered Planning: Claude breaks down complex tasks into steps
  • Visual Analysis: Takes screenshots and analyzes page content
  • Interactive Execution: Step-by-step task execution with user confirmation
  • Observation History: Tracks and learns from previous actions

6. Troubleshooting

API Key Error: If you see "Could not resolve authentication method", make sure:

  • The ANTHROPIC_API_KEY environment variable is set in your current shell
  • You can verify it's set by running: echo $ANTHROPIC_API_KEY
  • You're running from the project root directory

Browser Issues: If Playwright browsers aren't installed:

playwright install

Why Agents Need This

Traditional web scraping provides raw HTML, which is difficult for AI agents to interpret. CeSail solves this by:

  • Semantic Understanding: Identifies what each element does (button, form, link, etc.)
  • Action Mapping: Maps elements to executable actions (click, type, navigate)
  • Context Enrichment: Adds metadata about element purpose and relationships
  • Structured Data: Outputs clean, predictable data structures
  • Visual Context: Combines DOM analysis with visual information via screenshots and overlays highlighting actionable elements
  • Highly Configurable: Customizable settings for different use cases and requirements

This transformation makes it possible for AI agents to:

  • Understand page structure at a glance
  • Identify actionable elements quickly
  • Execute precise interactions
  • Adapt to different page layouts
  • Make intelligent decisions about next actions

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Cursor      โ”‚    โ”‚   MCP Server    โ”‚    โ”‚  DOM Parser     โ”‚
โ”‚   (AI Agent)    โ”‚โ—„โ”€โ”€โ–บโ”‚   (Python)      โ”‚โ—„โ”€โ”€โ–บโ”‚  (Python)       โ”‚
โ”‚                 โ”‚    โ”‚                 โ”‚    โ”‚                 โ”‚
โ”‚ โ€ข Natural Lang. โ”‚    โ”‚ โ€ข FastMCP APIs  โ”‚    โ”‚ โ€ข Page Analyzer โ”‚
โ”‚ โ€ข Task Planning โ”‚    โ”‚ โ€ข Web Automationโ”‚    โ”‚ โ€ข Action Exec.  โ”‚
โ”‚ โ€ข Execution     โ”‚    โ”‚ โ€ข Screenshots   โ”‚    โ”‚ โ€ข Idle Watcher  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                        โ”‚
                                                        โ”‚
                                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                        โ”‚   Web Browser   โ”‚
                                        โ”‚  (Playwright)   โ”‚
                                        โ”‚                 โ”‚
                                        โ”‚ โ€ข Page Control  โ”‚
                                        โ”‚ โ€ข DOM Access    โ”‚
                                        โ”‚ โ€ข Screenshots   โ”‚
                                        โ”‚ โ€ข Actions       โ”‚
                                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                        โ”‚
                                                        โ”‚
                                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                        โ”‚  JavaScript     โ”‚
                                        โ”‚  Layer          โ”‚
                                        โ”‚                 โ”‚
                                        โ”‚ โ€ข Element Ext.  โ”‚
                                        โ”‚ โ€ข Selector Gen. โ”‚
                                        โ”‚ โ€ข Text Analysis โ”‚
                                        โ”‚ โ€ข Action Ext.   โ”‚
                                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Architecture Points:

  • MCP Server: Connects to DOM Parser for web automation APIs
  • DOM Parser: Orchestrates page analysis, action execution, and idle watching
  • Web Browser: Connected to DOM Parser for page control and actions
  • JavaScript Layer: Injected into web browser for DOM parsing and element extraction
  • Actions: Executed by Playwright, parsing done manually through JavaScript

Components

1. DOM Parser JavaScript Layer (cesail/dom_parser/src/js/)

Core DOM parsing engine that transforms raw HTML into structured, agent-friendly data.

Language: JavaScript
Features:

  • Element Extraction: Identifies and categorizes interactive elements (buttons, forms, links)
  • Semantic Analysis: Understands element purpose and context
  • Action Mapping: Maps elements to executable actions (click, type, navigate)
  • Text Scoring: Prioritizes important text content for agents
  • Selector Generation: Creates reliable CSS selectors for element targeting
  • Performance Optimization: Caching and monitoring for speed
  • ARIA Support: Accessibility attribute analysis
  • Visual Context: Combines DOM data with visual information
  • Processing Pipeline: Multi-stage element processing and filtering

Key Components:

  • index.js: Main entry point and public API
  • action-extraction.js: Extracts actionable elements and metadata
  • filter-elements.js: Filters and groups elements by importance
  • scoring.js: Scores elements based on visibility and interactivity
  • selector-extraction.js: Generates reliable CSS selectors
  • visualizer.js: Visual debugging and element highlighting
  • cache-manager.js: Performance optimization and caching

Data Transformation Example:

// Raw HTML input
<button class="btn-primary" onclick="submit()">Submit Form</button>
<input type="text" placeholder="Enter email" id="email" />

// CeSail transforms to agent-friendly JSON
{
  "type": "BUTTON",
  "selector": "button.btn-primary",
  "text": "Submit Form",
  "action": "CLICK",
  "importance": 0.9,
  "context": "form submission",
  "metadata": {
    "aria-label": null,
    "disabled": false,
    "visible": true
  }
}

Documentation: See cesail/dom_parser/src/js/README.md

2. DOM Parser Python Layer (cesail/dom_parser/src/py/)

Orchestration layer that manages browser interactions and provides high-level APIs.

Language: Python
Features:

  • Page Analysis: Comprehensive page structure analysis and element extraction
  • Action Execution: Executes clicks, typing, navigation, and other web actions
  • Idle Watching: Monitors page state changes and waits for stability
  • Screenshot Integration: Captures and analyzes visual page content
  • Configuration Management: Flexible configuration for different use cases
  • Session Management: Maintains browser state across interactions
  • Error Handling: Robust error recovery and retry logic

Key Components:

  • dom_parser.py: Main interface for DOM parsing and interaction
  • page_analyzer.py: Analyzes page structure and extracts actionable elements
  • action_executor.py: Executes web actions through Playwright
  • idle_watcher.py: Monitors page state and waits for stability
  • screenshot.py: Captures and processes page screenshots
  • types.py: Data structures and type definitions
  • config.py: Configuration management and validation
  • actions_plugins/: Modular action implementations (navigation, interaction, input, system)

Integration Example:

async with DOMParser() as parser:
    # Navigate to page
    await parser.navigate("https://example.com")
    
    # Analyze page structure
    parsed_page = await parser.analyze_page()
    
    # Execute actions
    await parser.click("button.btn-primary")
    await parser.type("input#email", "user@example.com")

Documentation: See cesail/dom_parser/src/py/README.md

2. MCP Server (cesail/cesail_mcp/)

FastMCP server that provides standardized APIs for agents to interact with transformed web data.

Language: Python
Features:

  • Structured APIs: Clean, predictable endpoints for web automation
  • Action Execution: Execute clicks, typing, navigation based on transformed data
  • Page Analysis: Get structured page information in agent-friendly format
  • Screenshot Integration: Visual context combined with structured data
  • Session Management: Maintain state across interactions
  • Error Handling: Robust retry logic and error recovery

Agent-Friendly API Example:

# Agent receives structured data from CeSail
parsed_page = await parser.analyze_page()

# Get the actions data (this is what agents typically work with)
actions = parsed_page.get_actions()

# Example actions data structure
actions_data = [
  {
    "type": "LINK",
    "selector": "2",
    "importantText": "Vintage vibesCreate your weekend moodboard | Vinta | /today/best/create-your-weekend-moodboard/128099/"
  },
  {
    "type": "LINK", 
    "selector": "3",
    "importantText": "Summer hobbiesTry bead embroidery | Summer hobbies | /today/best/try-bead-embroidery/128240/"
  },
  {
    "type": "SELECT",
    "selector": "5", 
    "importantText": "search-box-input | combobox | Search | Search"
  },
  {
    "type": "BUTTON",
    "selector": "8",
    "importantText": "vertical-nav-more-options-button | More options | More options"
  },
  {
    "type": "BUTTON",
    "selector": "10",
    "importantText": "Sign up"
  }
  ]
 

**Documentation**: See [cesail/dom_parser/src/py/README.md](cesail/dom_parser/src/py/README.md) for more details about the parsed page data structure.

Usage: python3 -m cesail.cesail_mcp.fastmcp_server

3. Simple Agent (cesail/simple_agent/)

AI-powered web automation agent using LLM for task breakdown and execution.

Language: Python
Features:

  • Natural language task processing
  • Automated task breakdown and planning
  • LLM-powered decision making
  • Visual analysis with screenshots
  • Interactive execution monitoring

Documentation: See cesail/simple_agent/README.md for more details.

Usage: python3 -m cesail.simple_agent.simple_agent

Testing

CeSail includes comprehensive test suites to validate functionality and demonstrate capabilities.

Test Categories

  • Playground Tests - Integration tests with real websites (Google, Amazon, YouTube, Pinterest, etc.)
  • Unit Tests - Individual component testing
  • Replay Tests - Regression testing with golden values

Quick Start

# Activate virtual environment
source venv/bin/activate

# Set PYTHONPATH
export PYTHONPATH=/Users/rachitapradeep/CeSail:$PYTHONPATH

# Run playground tests (great way to see CeSail in action!)
pytest cesail/dom_parser/tests/playground/test_page_analyzer_integration_pinterest.py -v -s

# Run all tests
pytest cesail/dom_parser/tests/ -v

Playground Tests

The playground tests are an excellent way to see CeSail navigate through real websites:

  • Google Search: Navigate and search functionality
  • Amazon: Product browsing and search
  • YouTube: Video navigation and interaction
  • Pinterest: Image browsing and pinning
  • Airbnb: Property search and filtering
  • Google Flights: Flight search and booking flow

These tests demonstrate CeSail's ability to:

  • Extract interactive elements from complex websites
  • Navigate through multi-step workflows
  • Handle dynamic content and AJAX loading
  • Generate screenshots with bounding boxes
  • Process structured data for AI agents

Documentation: See cesail/dom_parser/tests/README.md for complete testing guide and examples.

Development Installation

For development or advanced usage:

Prerequisites:

  • Python: 3.9 or higher
  • Node.js: 14 or higher (for DOM Parser development)
  • Anthropic API Key: Required for Simple Agent (Claude)
  • Git: For cloning the repository

Installation:

  1. Clone the repository:

    git clone https://github.com/AkilaJay/cesail.git
    cd cesail
    
  2. Set up Python environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -e .
    
  3. Set up DOM Parser (optional):

    cd cesail/dom_parser
    npm install
    npm run build
    cd ..
    
  4. Configure environment (for Simple Agent):

    # Create .env file in cesail/simple_agent/ directory
    

echo "ANTHROPIC_API_KEY=your_anthropic_api_key_here" > cesail/simple_agent/.env


5. **Playwright browsers are installed automatically** during package installation.
If you encounter any issues, you can manually install them:
```bash
playwright install

Troubleshooting

Common Issues

1. Import Errors

Problem: ModuleNotFoundError: No module named 'dom_parser' Solution: Ensure you're in the correct directory and virtual environment is activated

2. Playwright Browser Issues

Problem: Browser not found or crashes Solution: Reinstall Playwright browsers:

playwright install

3. Anthropic API Errors

Problem: API key invalid or rate limited Solution: Check your API key and usage limits in the Anthropic dashboard

4. Screenshot Failures

Problem: Screenshots fail with "Target page closed" error Solution: Add proper error handling and retry logic

API Reference

For detailed API documentation, see the component-specific README files:

DOM Parser APIs

MCP Server API

  • FastMCP Integration: See the MCP server documentation for standardized web automation APIs
  • Documentation: See cesail_mcp/README.md for complete API reference and usage examples

Simple Agent API

  • Natural Language Processing: Process user input and execute web automation tasks
  • LLM Integration: AI-powered task breakdown and execution
  • Documentation: See cesail/simple_agent/README.md for current API details

Contributing

We welcome contributions! Here's how to get started:

Development Setup

  1. Fork the repository
  2. Create a feature branch:
    git checkout -b feature/your-feature-name
    
  3. Make your changes
  4. Add tests for new functionality
  5. Run tests to ensure everything works
  6. Submit a pull request

Code Style

  • Python: Follow PEP 8, use Black for formatting
  • JavaScript: Follow ESLint rules, use Prettier for formatting
  • Documentation: Update README files for new features

Testing

  • Write unit tests for new functions
  • Add integration tests for new features
  • Ensure all existing tests pass

Project Structure

cesail/
โ”œโ”€โ”€ cesail/                  # Python package
โ”‚   โ”œโ”€โ”€ dom_parser/          # JavaScript DOM parser
โ”‚   โ”œโ”€โ”€ src/                # Source code
โ”‚   โ”œโ”€โ”€ dist/               # Built files
โ”‚   โ”œโ”€โ”€ tests/              # JavaScript tests
โ”‚   โ””โ”€โ”€ README.md           # Component documentation
โ”‚   โ”œโ”€โ”€ cesail_mcp/         # FastMCP server
โ”‚   โ”œโ”€โ”€ fastmcp_server.py   # Main server file
โ”‚   โ”œโ”€โ”€ server.py           # Alternative server
โ”‚   โ””โ”€โ”€ tests/              # MCP tests
โ”‚   โ”œโ”€โ”€ simple_agent/       # AI web automation agent
โ”‚   โ”œโ”€โ”€ simple_agent.py     # Main agent file
โ”‚   โ”œโ”€โ”€ llm_interface.py    # LLM integration
โ”‚   โ””โ”€โ”€ .env               # Environment variables
โ”œโ”€โ”€ venv/                   # Python virtual environment
โ”œโ”€โ”€ setup.py               # Python package configuration
โ”œโ”€โ”€ pyproject.toml         # Project configuration
โ””โ”€โ”€ README.md              # This file

Support

  • Issues: Report bugs and feature requests on GitHub
  • Discussions: Join discussions for questions and ideas
  • Documentation: Check component-specific README files for detailed docs

Roadmap

  • Enhanced simple agent
  • Plugin framework for actions
  • More native actions / Parser enhancements
  • Replay framework

Help needed / Bugs

  • Idle watcher doesn't always wait for the page to load. Need to fix.
  • Simple agent enhancements
  • Parser enhancements
  • Testing

Contact

For questions, issues, or contributions:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cesail-0.2.3.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cesail-0.2.3-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file cesail-0.2.3.tar.gz.

File metadata

  • Download URL: cesail-0.2.3.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for cesail-0.2.3.tar.gz
Algorithm Hash digest
SHA256 0840257db626b79b93f7d8781e8209554e102bf76c5d8b020ebc4871b37bd093
MD5 d68c4b9f2b9fe1cef1d0e1c673dab0f7
BLAKE2b-256 c25cf3800e6e43e188baafe63802419e5ced2e9645c8421d695676fe3976f2dd

See more details on using hashes here.

File details

Details for the file cesail-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: cesail-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for cesail-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 12c6e1cae50909ce9aed93af42c1191fd6ad263f723309a571fcad6f81832af4
MD5 7e538514c7280298499fdde2e8799ef4
BLAKE2b-256 cde45a26fa0716e8fcc95334b610617a91552e0992221ffb24abb8a4137d997d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page