A comprehensive web automation and DOM parsing platform with AI-powered agents

These details have not been verified by PyPI

Project links

Project description

CeSail

A comprehensive web automation and DOM parsing platform with AI-powered agents.

Project Overview

CeSail is a multi-component system for web automation, DOM parsing, and AI-powered web interactions. It consists of several interconnected modules designed to work together to provide intelligent web automation capabilities.

Key Features

🤖 AI-Powered Automation: Natural language task processing with LLM integration
🌐 Web Scraping & Analysis: Advanced DOM parsing and element extraction
🔧 MCP Integration: FastMCP server for standardized automation APIs
📸 Visual Analysis: Screenshot capture and visual element detection
⚡ High Performance: Optimized for speed and reliability
🔒 Secure: Built-in security features and best practices

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Simple Agent  │    │   MCP Server    │    │  DOM Parser     │
│   (Python)      │◄──►│   (Python)      │◄──►│  (JavaScript)   │
│                 │    │                 │    │                 │
│ • LLM Interface │    │ • FastMCP APIs  │    │ • Element Ext.  │
│ • Task Planning │    │ • Web Automation│    │ • Selector Gen. │
│ • Execution     │    │ • Screenshots   │    │ • Text Analysis │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Web Browser   │
                    │  (Playwright)   │
                    │                 │
                    │ • Page Control  │
                    │ • DOM Access    │
                    │ • Screenshots   │
                    └─────────────────┘

Components

1. DOM Parser (`dom_parser/`)

JavaScript-based DOM parser for extracting interactive elements and page structure.

Language: JavaScript/Node.js
Features:

Element extraction and analysis
Text content extraction and scoring
Robust CSS selector generation
Performance monitoring and caching
ARIA attribute analysis
Visual element detection

Documentation: See dom_parser/README.md

2. MCP Server (`mcp/`)

FastMCP server providing standardized web automation APIs.

Language: Python
Features:

Web automation APIs (click, type, navigate, etc.)
Page analysis and element extraction
Screenshot capture and analysis
Session management
Error handling and retry logic

Usage: python3 mcp/fastmcp_server.py

3. Simple Agent (`simple_agent/`)

AI-powered web automation agent using LLM for task breakdown and execution.

Language: Python
Features:

Natural language task processing
Automated task breakdown and planning
LLM-powered decision making
Visual analysis with screenshots
Interactive execution monitoring

Usage: python3 simple_agent/simple_agent.py

Quick Start

Prerequisites

Python: 3.9 or higher
Node.js: 14 or higher
OpenAI API Key: Required for Simple Agent
Git: For cloning the repository

Installation

Clone the repository:

git clone https://github.com/yourusername/cesail.git
cd cesail

Set up Python environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Set up DOM Parser (optional):

cd dom_parser
npm install
npm run build
cd ..

Configure environment (for Simple Agent):

# Create .env file in simple_agent/ directory
echo "OPENAI_API_KEY=your_openai_api_key_here" > simple_agent/.env

Install Playwright browsers:
```
playwright install
```

Usage Examples

Simple Agent - Basic Usage

The Simple Agent allows you to perform web tasks using natural language:

python3 simple_agent/simple_agent.py

Example Interaction:

Enter the URL you want to navigate to: https://www.amazon.com

What would you like me to do? Search for wireless headphones under $100

The agent will:
1. Navigate to Amazon
2. Find the search box
3. Type "wireless headphones"
4. Apply price filter
5. Show you the results

MCP Server - API Usage

Start the MCP server for programmatic access:

python3 mcp/fastmcp_server.py

Example API Calls:

# Navigate to a page
await execute_action({
    "type": "navigate",
    "url": "https://example.com"
})

# Click an element
await execute_action({
    "type": "click",
    "element_id": "submit-button"
})

# Get page details
page_info = await get_page_details()

DOM Parser - JavaScript Usage

Use the DOM parser directly in JavaScript:

import { extractElements } from 'dom-parser';

// Extract all interactive elements
const result = await extractElements();
console.log(result.actions);

// Get specific element types
const buttons = result.actions.filter(action => action.type === 'BUTTON');
const forms = result.forms;

Advanced Usage

Custom Action Types

You can define custom actions for specific use cases:

from dom_parser.src.py.types import Action, ActionType

# Custom action for form filling
custom_action = Action(
    type=ActionType.TYPE,
    element_id="email-input",
    text_to_type="user@example.com",
    description="Fill email field",
    confidence=0.9
)

Error Handling

Implement robust error handling:

try:
    result = await parser.execute_action(action)
except Exception as e:
    print(f"Action failed: {e}")
    # Implement retry logic or fallback

Performance Optimization

Optimize for speed and reliability:

# Configure parser for performance
parser = DOMParser(
    headless=True,  # Run in headless mode
    timeout=30000,  # 30 second timeout
    wait_for_idle=True  # Wait for page to be idle
)

Configuration

Environment Variables

Create a .env file in the simple_agent/ directory:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1

# Optional: Custom model
OPENAI_MODEL=gpt-4o

# Optional: Custom base URL for different providers
# OPENAI_API_BASE=https://your-custom-endpoint.com/v1

Playwright Configuration

Configure Playwright for your environment:

# In your code
parser = DOMParser(
    headless=False,  # Show browser window
    slow_mo=1000,    # Slow down actions for debugging
    viewport={'width': 1920, 'height': 1080}
)

Development

Running Tests

# Python tests
pytest dom_parser/tests/ -v

# JavaScript tests (in dom_parser directory)
cd dom_parser && npm test

# Integration tests
pytest dom_parser/tests/playground/ -v

Code Quality

# Python linting
black .
isort .
mypy .

# JavaScript linting (in dom_parser directory)
cd dom_parser && npm run lint

Building

# Build Python package
pip install -e .

# Build JavaScript bundle (in dom_parser directory)
cd dom_parser && npm run build

Troubleshooting

Common Issues

1. Import Errors

Problem: ModuleNotFoundError: No module named 'dom_parser' Solution: Ensure you're in the correct directory and virtual environment is activated

2. Playwright Browser Issues

Problem: Browser not found or crashes Solution: Reinstall Playwright browsers:

playwright install

3. OpenAI API Errors

Problem: API key invalid or rate limited Solution: Check your API key and usage limits in the OpenAI dashboard

4. Screenshot Failures

Problem: Screenshots fail with "Target page closed" error Solution: Add proper error handling and retry logic

Debug Mode

Enable debug logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Performance Issues

If you experience performance issues:

Use headless mode: headless=True
Reduce timeout values: timeout=15000
Disable unnecessary features: wait_for_idle=False
Use caching: Enable element caching where possible

API Reference

Simple Agent API

`SimpleAgent.initialize(url=None)`

Initialize the agent and navigate to a URL.

Parameters:

url (str, optional): URL to navigate to. If None, prompts user.

`SimpleAgent.process_user_input(user_input)`

Process natural language input and execute tasks.

Parameters:

user_input (str): Natural language description of the task

Returns: Dictionary with breakdown and execution results

MCP Server API

`execute_action(params)`

Execute a web automation action.

Parameters:

params (dict): Action parameters including type, element_id, etc.

`get_page_details(params)`

Get detailed information about the current page.

Parameters:

params (dict): Optional parameters for analysis

DOM Parser API

`extractElements()`

Extract all interactive elements from the current page.

Returns: Object with actions, meta, outline, text, forms, etc.

`getTopLevelElements()`

Get top-level elements containing multiple interactive elements.

Returns: Array of top-level elements

Contributing

We welcome contributions! Here's how to get started:

Development Setup

Fork the repository

Create a feature branch:

git checkout -b feature/your-feature-name

Make your changes
Add tests for new functionality
Run tests to ensure everything works
Submit a pull request

Code Style

Python: Follow PEP 8, use Black for formatting
JavaScript: Follow ESLint rules, use Prettier for formatting
Documentation: Update README files for new features

Testing

Write unit tests for new functions
Add integration tests for new features
Ensure all existing tests pass

Project Structure

cesail/
├── dom_parser/              # JavaScript DOM parser
│   ├── src/                # Source code
│   ├── dist/               # Built files
│   ├── tests/              # JavaScript tests
│   └── README.md           # Component documentation
├── mcp/                    # FastMCP server
│   ├── fastmcp_server.py   # Main server file
│   ├── server.py           # Alternative server
│   └── tests/              # MCP tests
├── simple_agent/           # AI web automation agent
│   ├── simple_agent.py     # Main agent file
│   ├── llm_interface.py    # LLM integration
│   └── .env               # Environment variables
├── venv/                   # Python virtual environment
├── setup.py               # Python package configuration
├── pyproject.toml         # Project configuration
└── README.md              # This file

License

MIT License - see LICENSE file for details.

Support

Issues: Report bugs and feature requests on GitHub
Discussions: Join discussions for questions and ideas
Documentation: Check component-specific README files for detailed docs

Roadmap

Enhanced error recovery mechanisms
Support for more browser automation frameworks
Advanced visual element detection
Multi-language support
Cloud deployment options
Performance monitoring dashboard

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Sep 2, 2025

0.2.2

Aug 25, 2025

0.2.1

Aug 23, 2025

0.2.0

Aug 23, 2025

0.1.1

Aug 22, 2025

This version

0.1.0

Aug 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cesail-0.1.0.tar.gz (1.0 MB view details)

Uploaded Aug 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cesail-0.1.0-py3-none-any.whl (1.1 MB view details)

Uploaded Aug 19, 2025 Python 3

File details

Details for the file cesail-0.1.0.tar.gz.

File metadata

Download URL: cesail-0.1.0.tar.gz
Upload date: Aug 19, 2025
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for cesail-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`51a8e2bda1d847eb0e32fc06e9eadb8ff4017c7f16e69c70b6573333182dc999`
MD5	`f08063a53e41bc5eb22361c5923259fb`
BLAKE2b-256	`7df71006bd50ec5c768f4e2729d291312e8c8326a06ffa236b8f883b9f91b28d`

See more details on using hashes here.

File details

Details for the file cesail-0.1.0-py3-none-any.whl.

File metadata

Download URL: cesail-0.1.0-py3-none-any.whl
Upload date: Aug 19, 2025
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for cesail-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9773e7ba5a1aeeaa4796f640c45df00b6e87cdc41e7354fc43646aa3c597828a`
MD5	`108cc9d4caa9462b2fb7b55e5506f641`
BLAKE2b-256	`c349f20f3411a655efe5a05f693268ef982096682997ec3e224d79fe9b9e3e96`

See more details on using hashes here.

cesail 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CeSail

Project Overview

Key Features

Architecture

Components

1. DOM Parser (dom_parser/)

2. MCP Server (mcp/)

3. Simple Agent (simple_agent/)

Quick Start

Prerequisites

Installation

Usage Examples

Simple Agent - Basic Usage

MCP Server - API Usage

DOM Parser - JavaScript Usage

Advanced Usage

Custom Action Types

Error Handling

Performance Optimization

Configuration

Environment Variables

Playwright Configuration

Development

Running Tests

Code Quality

Building

Troubleshooting

Common Issues

1. Import Errors

2. Playwright Browser Issues

3. OpenAI API Errors

4. Screenshot Failures

Debug Mode

Performance Issues

API Reference

Simple Agent API

SimpleAgent.initialize(url=None)

SimpleAgent.process_user_input(user_input)

MCP Server API

execute_action(params)

get_page_details(params)

DOM Parser API

extractElements()

getTopLevelElements()

Contributing

Development Setup

Code Style

Testing

Project Structure

License

Support

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. DOM Parser (`dom_parser/`)

2. MCP Server (`mcp/`)

3. Simple Agent (`simple_agent/`)

`SimpleAgent.initialize(url=None)`

`SimpleAgent.process_user_input(user_input)`

`execute_action(params)`

`get_page_details(params)`

`extractElements()`

`getTopLevelElements()`