Browser Automation with Local LLM (8GB GPU compatible)
Project description
curllm = curl + LLM
Intelligent Browser Automation with Local LLMs
Quick Start • Features • Examples • Documentation • API
🎯 What is curllm?
curllm is a powerful CLI tool that combines browser automation with local LLMs (like Ollama's Qwen, Llama, Mistral) to intelligently extract data, fill forms, and automate web workflows - all running locally on your machine with complete privacy.
🆕 v2 LLM-DSL Architecture! Dynamic element detection, semantic goal understanding, no hardcoded selectors. 388 tests passing.
# Extract products with prices from any e-commerce site
curllm "https://shop.example.com" -d "Find all products under $100"
# Fill contact forms automatically
curllm --stealth "https://example.com/contact" -d "Fill form: name=John, email=john@example.com"
# Extract all emails from a page
curllm "https://example.com" -d "extract all email addresses"
✨ Features
| Feature | Description |
|---|---|
| 🧠 Local LLM | Works with 8GB GPUs (Qwen 2.5, Llama 3, Mistral) |
| 🎯 Smart Extraction | LLM-guided DOM analysis - no hardcoded selectors |
| 📝 Form Automation | Auto-fill forms with intelligent field mapping |
| 🥷 Stealth Mode | Bypass anti-bot detection |
| 👁️ Visual Mode | See browser actions in real-time |
| 🔍 BQL Support | Browser Query Language for structured queries |
| 📊 Export Formats | JSON, CSV, HTML, XLS output |
| 🔒 Privacy-First | Everything runs locally - no cloud APIs needed |
🧠 LLM-DSL Architecture
curllm v2 uses LLM-DSL (LLM Domain Specific Language) - a dynamic approach that eliminates hardcoded selectors:
┌─────────────────────────────────────────────────────────────┐
│ LLM-DSL Flow │
├─────────────────────────────────────────────────────────────┤
│ 1. Goal Detection (semantic) │
│ "Find RAM DDR5" → FIND_PRODUCTS │
│ │
│ 2. Strategy Selection │
│ FIND_PRODUCTS → use search flow │
│ FIND_CART → find link by semantic scoring │
│ │
│ 3. Element Finding (LLM-first) │
│ LLM analysis → Statistical scoring → Fallback │
│ │
│ 4. Dynamic Selector Generation │
│ Analyze DOM → Score elements → Generate selector │
└─────────────────────────────────────────────────────────────┘
Key Benefits
| Feature | Traditional | LLM-DSL |
|---|---|---|
| Selectors | Hardcoded CSS/XPath | Dynamic generation |
| Keywords | Static lists | Semantic analysis |
| Language | English only | Multi-language (PL, EN) |
| Maintenance | Manual updates | Self-adapting |
🚀 Quick Start
Installation
pip install -U curllm
curllm-setup # One-time setup (installs Playwright browsers)
curllm-doctor # Verify installation
Requirements
- Python 3.10+
- GPU: NVIDIA with 6-8GB VRAM (RTX 3060/4060) or CPU mode
- Ollama: For local LLM inference
# Install Ollama (if not installed)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5:7b
📖 Examples
Extract Data
# Extract all links
curllm "https://example.com" -d "extract all links"
# Extract emails
curllm "https://example.com/contact" -d "extract all email addresses"
# Output: {"emails": ["info@example.com", "sales@example.com"]}
# Extract products with price filter
curllm --stealth "https://shop.example.com" -d "Find all products under 500zł"
Form Automation
# Fill contact form
curllm --visual --stealth "https://example.com/contact" \
-d "Fill form: name=John Doe, email=john@example.com, message=Hello"
# Login automation
curllm --visual "https://app.example.com/login" \
-d '{"instruction":"Login", "credentials":{"user":"admin", "pass":"secret"}}'
Export Results
# Export to CSV
curllm "https://example.com" -d "extract all products" --csv -o products.csv
# Export to HTML
curllm "https://example.com" -d "extract all links" --html -o links.html
# Export to Excel
curllm "https://example.com" -d "extract all data" --xls -o data.xlsx
Screenshots
# Take screenshot
curllm "https://example.com" -d "screenshot"
# Visual mode (watch browser)
curllm --visual "https://example.com" -d "extract all links"
BQL Queries
curllm --bql -d 'query {
page(url: "https://news.ycombinator.com") {
title
links: select(css: "a.titlelink") { text url: attr(name: "href") }
}
}'
🌐 Web Interface
curllm-web start # Start web UI at http://localhost:5000
curllm-web status # Check status
curllm-web stop # Stop server
Features:
- 🎨 Modern responsive UI
- 📝 19 pre-configured prompts
- 📊 Real-time log viewer
- 📤 File upload support
🔧 Configuration
Environment variables (.env):
CURLLM_MODEL=qwen2.5:7b # LLM model
CURLLM_OLLAMA_HOST=http://localhost:11434
CURLLM_HEADLESS=true # Run browser headlessly
CURLLM_STEALTH_MODE=false # Anti-detection
CURLLM_LOCALE=en-US # Browser locale
🏗️ Architecture
┌─────────────────────────────────────────────────────────────────┐
│ curllm CLI │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌───────────────┐ │
│ │ DSL Executor │───▶│ Knowledge Base │───▶│ Strategy YAML │ │
│ │ (Orchestrator)│ │ (SQLite) │ │ Files │ │
│ └────────────────┘ └────────────────┘ └───────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ DOM Toolkit (Pure JS) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │ │
│ │ │Structure │ │ Patterns │ │Selectors │ │ Prices │ │ │
│ │ │ Analyzer │ │ Detector │ │Generator │ │ Detector │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Playwright Browser Engine │ │
│ │ (Chromium with Stealth & Anti-Detection) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Ollama / LiteLLM │ │
│ │ (Local LLM: Qwen 2.5, Llama 3, Mistral, GPT, etc) │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Key Components
| Component | Description | LLM Calls |
|---|---|---|
| URL Resolver | Smart navigation with goal detection | 0-1 |
| Goal Detector | Semantic intent understanding | 0-1 |
| Element Finder | Dynamic selector generation | 0-1 |
| DOM Toolkit | Pure JavaScript atomic queries | 0 |
| SPA Hydration | Wait for CSR/SPA content | 0 |
📖 Full Architecture Documentation →
🧬 DSL System (Strategy-Based Extraction)
Note: The YAML DSL system works alongside the newer LLM-DSL. YAML strategies are used for known sites with proven extraction patterns, while LLM-DSL handles unknown sites dynamically.
curllm automatically learns and saves successful extraction strategies as YAML files:
# dsl/ceneo_products.yaml - Auto-generated from successful extraction
url_pattern: "*.ceneo.pl/*"
task: extract_products
algorithm: statistical_containers
selector: div.product-card
fields:
name: h3.title
price: span.price
url: a[href]
metadata:
success_rate: 0.95
use_count: 42
How It Works
- First visit - LLM-DSL dynamically analyzes page, extracts data
- Successful - Strategy saved to
dsl/*.yaml, recorded in Knowledge Base - Next visit - Knowledge Base loads saved strategy (fast path)
- Unknown site - Falls back to LLM-DSL dynamic discovery
┌─────────────────────────────────────────────────────────┐
│ Request Flow │
├─────────────────────────────────────────────────────────┤
│ URL → Knowledge Base lookup │
│ │ │
│ ├─ Found? → Load YAML strategy (fast) │
│ │ │
│ └─ Not found? → LLM-DSL dynamic (flexible) │
│ │ │
│ └─ Success? → Save to YAML │
└─────────────────────────────────────────────────────────┘
Algorithms
| Algorithm | Best For | Speed |
|---|---|---|
statistical_containers |
Product grids | ⚡ Fast |
pattern_detection |
Lists, tables | ⚡ Fast |
llm_guided |
Complex layouts | 🐢 Slower |
form_fill |
Contact forms | ⚡ Fast |
🤝 Multi-Provider LLM Support
curllm supports multiple LLM providers via LiteLLM:
from curllm_core import LLMConfig
# OpenAI
config = LLMConfig(provider="openai/gpt-4o-mini")
# Anthropic
config = LLMConfig(provider="anthropic/claude-3-haiku-20240307")
# Google Gemini
config = LLMConfig(provider="gemini/gemini-2.0-flash")
# Local Ollama (default)
config = LLMConfig(provider="ollama/qwen2.5:7b")
📚 Documentation
Getting Started
Architecture
- 🏗️ System Architecture
- 🧬 DSL System - Strategy-based extraction
- ⚛️ DOM Toolkit - Pure JS queries
- 🧩 Components - Module overview
- 🔗 LLM-DSL URL Resolution - Smart URL navigation
Reference
🧪 Development
# Clone and install
git clone https://github.com/wronai/curllm.git
cd curllm
make install
# Run tests (388 tests passing)
make test
# Run URL resolver examples
cd examples/url_resolver && python run_all.py
# Run with Docker
docker compose up -d
📄 License
Apache License 2.0 - see LICENSE
🙏 Acknowledgments
Built with:
- Playwright - Browser automation
- Ollama - Local LLM inference
- LiteLLM - Multi-provider LLM support
- Flask - Web framework
⭐ Star this repo if you find it useful!
Made with ❤️ by wronai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file curllm-1.0.40.tar.gz.
File metadata
- Download URL: curllm-1.0.40.tar.gz
- Upload date:
- Size: 676.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02f357041e9a412281a96d9d2a0e5fb26e9b9ec374381a5b30b815bf0bf7820a
|
|
| MD5 |
3f7e168d9533453624241936658ea2b4
|
|
| BLAKE2b-256 |
b84ed844d8c90bc1385bd33d738de7689b4aa31525f97521cc933ae29bb75aa4
|
File details
Details for the file curllm-1.0.40-py3-none-any.whl.
File metadata
- Download URL: curllm-1.0.40-py3-none-any.whl
- Upload date:
- Size: 817.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b717052d941cb48a89bd8692eb4e144c5c4f2b133cfdcaaa47c3fcf0e74424e
|
|
| MD5 |
8261998091db994f52367a3e522ea5c8
|
|
| BLAKE2b-256 |
f49e22c9c7409414f68380d8217a4284d11987eeffd75290019e5aefd7cb85c1
|