A comprehensive web automation and DOM parsing platform with AI-powered agents
Project description
CeSail
A comprehensive web automation and DOM parsing platform with AI-powered agents.
Project Overview
CeSail is a multi-component system for web automation, DOM parsing, and AI-powered web interactions. It consists of several interconnected modules designed to work together to provide intelligent web automation capabilities.
Key Features
- ๐ค AI-Powered Automation: Natural language task processing with LLM integration
- ๐ Web Scraping & Analysis: Advanced DOM parsing and element extraction
- ๐ง MCP Integration: FastMCP server for standardized automation APIs
- ๐ธ Visual Analysis: Screenshot capture and visual element detection
- โก High Performance: Optimized for speed and reliability
- ๐ Secure: Built-in security features and best practices
Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Simple Agent โ โ MCP Server โ โ DOM Parser โ
โ (Python) โโโโโบโ (Python) โโโโโบโ (JavaScript) โ
โ โ โ โ โ โ
โ โข LLM Interface โ โ โข FastMCP APIs โ โ โข Element Ext. โ
โ โข Task Planning โ โ โข Web Automationโ โ โข Selector Gen. โ
โ โข Execution โ โ โข Screenshots โ โ โข Text Analysis โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโ
โ Web Browser โ
โ (Playwright) โ
โ โ
โ โข Page Control โ
โ โข DOM Access โ
โ โข Screenshots โ
โโโโโโโโโโโโโโโโโโโ
Components
1. DOM Parser (dom_parser/)
JavaScript-based DOM parser for extracting interactive elements and page structure.
Language: JavaScript/Node.js
Features:
- Element extraction and analysis
- Text content extraction and scoring
- Robust CSS selector generation
- Performance monitoring and caching
- ARIA attribute analysis
- Visual element detection
Documentation: See dom_parser/README.md
2. MCP Server (mcp/)
FastMCP server providing standardized web automation APIs.
Language: Python
Features:
- Web automation APIs (click, type, navigate, etc.)
- Page analysis and element extraction
- Screenshot capture and analysis
- Session management
- Error handling and retry logic
Usage: python3 mcp/fastmcp_server.py
3. Simple Agent (simple_agent/)
AI-powered web automation agent using LLM for task breakdown and execution.
Language: Python
Features:
- Natural language task processing
- Automated task breakdown and planning
- LLM-powered decision making
- Visual analysis with screenshots
- Interactive execution monitoring
Usage: python3 simple_agent/simple_agent.py
Quick Start
Prerequisites
- Python: 3.9 or higher
- Node.js: 14 or higher
- OpenAI API Key: Required for Simple Agent
- Git: For cloning the repository
Installation
-
Clone the repository:
git clone https://github.com/yourusername/cesail.git cd cesail
-
Set up Python environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -e .
-
Set up DOM Parser (optional):
cd dom_parser npm install npm run build cd ..
-
Configure environment (for Simple Agent):
# Create .env file in simple_agent/ directory echo "OPENAI_API_KEY=your_openai_api_key_here" > simple_agent/.env
-
Install Playwright browsers:
playwright install
Usage Examples
Simple Agent - Basic Usage
The Simple Agent allows you to perform web tasks using natural language:
python3 simple_agent/simple_agent.py
Example Interaction:
Enter the URL you want to navigate to: https://www.amazon.com
What would you like me to do? Search for wireless headphones under $100
The agent will:
1. Navigate to Amazon
2. Find the search box
3. Type "wireless headphones"
4. Apply price filter
5. Show you the results
MCP Server - API Usage
Start the MCP server for programmatic access:
python3 mcp/fastmcp_server.py
Example API Calls:
# Navigate to a page
await execute_action({
"type": "navigate",
"url": "https://example.com"
})
# Click an element
await execute_action({
"type": "click",
"element_id": "submit-button"
})
# Get page details
page_info = await get_page_details()
DOM Parser - JavaScript Usage
Use the DOM parser directly in JavaScript:
import { extractElements } from 'dom-parser';
// Extract all interactive elements
const result = await extractElements();
console.log(result.actions);
// Get specific element types
const buttons = result.actions.filter(action => action.type === 'BUTTON');
const forms = result.forms;
Advanced Usage
Custom Action Types
You can define custom actions for specific use cases:
from dom_parser.src.py.types import Action, ActionType
# Custom action for form filling
custom_action = Action(
type=ActionType.TYPE,
element_id="email-input",
text_to_type="user@example.com",
description="Fill email field",
confidence=0.9
)
Error Handling
Implement robust error handling:
try:
result = await parser.execute_action(action)
except Exception as e:
print(f"Action failed: {e}")
# Implement retry logic or fallback
Performance Optimization
Optimize for speed and reliability:
# Configure parser for performance
parser = DOMParser(
headless=True, # Run in headless mode
timeout=30000, # 30 second timeout
wait_for_idle=True # Wait for page to be idle
)
Configuration
Environment Variables
Create a .env file in the simple_agent/ directory:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_API_BASE=https://api.openai.com/v1
# Optional: Custom model
OPENAI_MODEL=gpt-4o
# Optional: Custom base URL for different providers
# OPENAI_API_BASE=https://your-custom-endpoint.com/v1
Playwright Configuration
Configure Playwright for your environment:
# In your code
parser = DOMParser(
headless=False, # Show browser window
slow_mo=1000, # Slow down actions for debugging
viewport={'width': 1920, 'height': 1080}
)
Development
Running Tests
# Python tests
pytest dom_parser/tests/ -v
# JavaScript tests (in dom_parser directory)
cd dom_parser && npm test
# Integration tests
pytest dom_parser/tests/playground/ -v
Code Quality
# Python linting
black .
isort .
mypy .
# JavaScript linting (in dom_parser directory)
cd dom_parser && npm run lint
Building
# Build Python package
pip install -e .
# Build JavaScript bundle (in dom_parser directory)
cd dom_parser && npm run build
Troubleshooting
Common Issues
1. Import Errors
Problem: ModuleNotFoundError: No module named 'dom_parser'
Solution: Ensure you're in the correct directory and virtual environment is activated
2. Playwright Browser Issues
Problem: Browser not found or crashes Solution: Reinstall Playwright browsers:
playwright install
3. OpenAI API Errors
Problem: API key invalid or rate limited Solution: Check your API key and usage limits in the OpenAI dashboard
4. Screenshot Failures
Problem: Screenshots fail with "Target page closed" error Solution: Add proper error handling and retry logic
Debug Mode
Enable debug logging:
import logging
logging.basicConfig(level=logging.DEBUG)
Performance Issues
If you experience performance issues:
- Use headless mode:
headless=True - Reduce timeout values:
timeout=15000 - Disable unnecessary features:
wait_for_idle=False - Use caching: Enable element caching where possible
API Reference
Simple Agent API
SimpleAgent.initialize(url=None)
Initialize the agent and navigate to a URL.
Parameters:
url(str, optional): URL to navigate to. If None, prompts user.
SimpleAgent.process_user_input(user_input)
Process natural language input and execute tasks.
Parameters:
user_input(str): Natural language description of the task
Returns: Dictionary with breakdown and execution results
MCP Server API
execute_action(params)
Execute a web automation action.
Parameters:
params(dict): Action parameters including type, element_id, etc.
get_page_details(params)
Get detailed information about the current page.
Parameters:
params(dict): Optional parameters for analysis
DOM Parser API
extractElements()
Extract all interactive elements from the current page.
Returns: Object with actions, meta, outline, text, forms, etc.
getTopLevelElements()
Get top-level elements containing multiple interactive elements.
Returns: Array of top-level elements
Contributing
We welcome contributions! Here's how to get started:
Development Setup
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature-name
- Make your changes
- Add tests for new functionality
- Run tests to ensure everything works
- Submit a pull request
Code Style
- Python: Follow PEP 8, use Black for formatting
- JavaScript: Follow ESLint rules, use Prettier for formatting
- Documentation: Update README files for new features
Testing
- Write unit tests for new functions
- Add integration tests for new features
- Ensure all existing tests pass
Project Structure
cesail/
โโโ dom_parser/ # JavaScript DOM parser
โ โโโ src/ # Source code
โ โโโ dist/ # Built files
โ โโโ tests/ # JavaScript tests
โ โโโ README.md # Component documentation
โโโ mcp/ # FastMCP server
โ โโโ fastmcp_server.py # Main server file
โ โโโ server.py # Alternative server
โ โโโ tests/ # MCP tests
โโโ simple_agent/ # AI web automation agent
โ โโโ simple_agent.py # Main agent file
โ โโโ llm_interface.py # LLM integration
โ โโโ .env # Environment variables
โโโ venv/ # Python virtual environment
โโโ setup.py # Python package configuration
โโโ pyproject.toml # Project configuration
โโโ README.md # This file
License
MIT License - see LICENSE file for details.
Support
- Issues: Report bugs and feature requests on GitHub
- Discussions: Join discussions for questions and ideas
- Documentation: Check component-specific README files for detailed docs
Roadmap
- Enhanced error recovery mechanisms
- Support for more browser automation frameworks
- Advanced visual element detection
- Multi-language support
- Cloud deployment options
- Performance monitoring dashboard
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cesail-0.1.0.tar.gz.
File metadata
- Download URL: cesail-0.1.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51a8e2bda1d847eb0e32fc06e9eadb8ff4017c7f16e69c70b6573333182dc999
|
|
| MD5 |
f08063a53e41bc5eb22361c5923259fb
|
|
| BLAKE2b-256 |
7df71006bd50ec5c768f4e2729d291312e8c8326a06ffa236b8f883b9f91b28d
|
File details
Details for the file cesail-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cesail-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9773e7ba5a1aeeaa4796f640c45df00b6e87cdc41e7354fc43646aa3c597828a
|
|
| MD5 |
108cc9d4caa9462b2fb7b55e5506f641
|
|
| BLAKE2b-256 |
c349f20f3411a655efe5a05f693268ef982096682997ec3e224d79fe9b9e3e96
|