Skip to main content

Browser Automation with Local LLM (8GB GPU compatible)

Project description

curllm logo

curllm = curl + LLM

Intelligent Browser Automation with Local LLMs

PyPI Python License Stars Issues

Quick StartFeaturesExamplesDocumentationAPI


🎯 What is curllm?

curllm is a powerful CLI tool that combines browser automation with local LLMs (like Ollama's Qwen, Llama, Mistral) to intelligently extract data, fill forms, and automate web workflows - all running locally on your machine with complete privacy.

# Extract products with prices from any e-commerce site
curllm "https://shop.example.com" -d "Find all products under $100"

# Fill contact forms automatically
curllm --stealth "https://example.com/contact" -d "Fill form: name=John, email=john@example.com"

# Extract all emails from a page
curllm "https://example.com" -d "extract all email addresses"

✨ Features

Feature Description
🧠 Local LLM Works with 8GB GPUs (Qwen 2.5, Llama 3, Mistral)
🎯 Smart Extraction LLM-guided DOM analysis - no hardcoded selectors
📝 Form Automation Auto-fill forms with intelligent field mapping
🥷 Stealth Mode Bypass anti-bot detection
👁️ Visual Mode See browser actions in real-time
🔍 BQL Support Browser Query Language for structured queries
📊 Export Formats JSON, CSV, HTML, XLS output
🔒 Privacy-First Everything runs locally - no cloud APIs needed

🚀 Quick Start

Installation

pip install -U curllm
curllm-setup      # One-time setup (installs Playwright browsers)
curllm-doctor     # Verify installation

Requirements

  • Python 3.10+
  • GPU: NVIDIA with 6-8GB VRAM (RTX 3060/4060) or CPU mode
  • Ollama: For local LLM inference
# Install Ollama (if not installed)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5:7b

📖 Examples

Extract Data

# Extract all links
curllm "https://example.com" -d "extract all links"

# Extract emails
curllm "https://example.com/contact" -d "extract all email addresses"
# Output: {"emails": ["info@example.com", "sales@example.com"]}

# Extract products with price filter
curllm --stealth "https://shop.example.com" -d "Find all products under 500zł"

Form Automation

# Fill contact form
curllm --visual --stealth "https://example.com/contact" \
  -d "Fill form: name=John Doe, email=john@example.com, message=Hello"

# Login automation
curllm --visual "https://app.example.com/login" \
  -d '{"instruction":"Login", "credentials":{"user":"admin", "pass":"secret"}}'

Export Results

# Export to CSV
curllm "https://example.com" -d "extract all products" --csv -o products.csv

# Export to HTML
curllm "https://example.com" -d "extract all links" --html -o links.html

# Export to Excel
curllm "https://example.com" -d "extract all data" --xls -o data.xlsx

Screenshots

# Take screenshot
curllm "https://example.com" -d "screenshot"

# Visual mode (watch browser)
curllm --visual "https://example.com" -d "extract all links"

BQL Queries

curllm --bql -d 'query {
  page(url: "https://news.ycombinator.com") {
    title
    links: select(css: "a.titlelink") { text url: attr(name: "href") }
  }
}'

🌐 Web Interface

curllm-web start   # Start web UI at http://localhost:5000
curllm-web status  # Check status
curllm-web stop    # Stop server

Features:

  • 🎨 Modern responsive UI
  • 📝 19 pre-configured prompts
  • 📊 Real-time log viewer
  • 📤 File upload support

🔧 Configuration

Environment variables (.env):

CURLLM_MODEL=qwen2.5:7b          # LLM model
CURLLM_OLLAMA_HOST=http://localhost:11434
CURLLM_HEADLESS=true             # Run browser headlessly
CURLLM_STEALTH_MODE=false        # Anti-detection
CURLLM_LOCALE=en-US              # Browser locale

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         curllm CLI                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌────────────────┐    ┌────────────────┐    ┌───────────────┐  │
│  │  DSL Executor  │───▶│ Knowledge Base │───▶│ Strategy YAML │  │
│  │  (Orchestrator)│    │   (SQLite)     │    │    Files      │  │
│  └────────────────┘    └────────────────┘    └───────────────┘  │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                    DOM Toolkit (Pure JS)                   │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐  │ │
│  │  │Structure │  │ Patterns │  │Selectors │  │   Prices   │  │ │
│  │  │ Analyzer │  │ Detector │  │Generator │  │  Detector  │  │ │
│  │  └──────────┘  └──────────┘  └──────────┘  └────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │              Playwright Browser Engine                     │ │
│  │         (Chromium with Stealth & Anti-Detection)           │ │
│  └────────────────────────────────────────────────────────────┘ │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                 Ollama / LiteLLM                           │ │
│  │      (Local LLM: Qwen 2.5, Llama 3, Mistral, GPT, etc)     │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Key Components

Component Description LLM Calls
DSL Executor Orchestrates extraction with fallback algorithms 1-3
Knowledge Base Tracks algorithm success per domain (SQLite) 0
DOM Toolkit Pure JavaScript atomic queries 0
Strategy Files Reusable YAML extraction recipes 0
Result Validator Validates output + optional LLM check 0-1

📖 Full Architecture Documentation →

🧬 DSL System (Strategy-Based Extraction)

curllm automatically learns and saves successful extraction strategies as YAML files:

# dsl/ceneo_products.yaml - Auto-generated from successful extraction
url_pattern: "*.ceneo.pl/*"
task: extract_products
algorithm: statistical_containers

selector: div.product-card
fields:
  name: h3.title
  price: span.price
  url: a[href]

metadata:
  success_rate: 0.95
  use_count: 42

How It Works

  1. First visit - DOM Toolkit finds containers, extracts data
  2. Successful - Strategy saved to dsl/*.yaml, recorded in Knowledge Base
  3. Next visit - Knowledge Base suggests best algorithm based on history
  4. Reuse - Strategy loaded from YAML, no discovery needed

Algorithms

Algorithm Best For Speed
statistical_containers Product grids ⚡ Fast
pattern_detection Lists, tables ⚡ Fast
llm_guided Complex layouts 🐢 Slower
form_fill Contact forms ⚡ Fast

📖 DSL System Documentation →

🤝 Multi-Provider LLM Support

curllm supports multiple LLM providers via LiteLLM:

from curllm_core import LLMConfig

# OpenAI
config = LLMConfig(provider="openai/gpt-4o-mini")

# Anthropic
config = LLMConfig(provider="anthropic/claude-3-haiku-20240307")

# Google Gemini
config = LLMConfig(provider="gemini/gemini-2.0-flash")

# Local Ollama (default)
config = LLMConfig(provider="ollama/qwen2.5:7b")

📚 Documentation

Getting Started

Architecture

Reference

🧪 Development

# Clone and install
git clone https://github.com/wronai/curllm.git
cd curllm
make install

# Run tests
make test

# Run with Docker
docker compose up -d

📄 License

Apache License 2.0 - see LICENSE

🙏 Acknowledgments

Built with:


⭐ Star this repo if you find it useful!

Made with ❤️ by wronai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curllm-1.0.34.tar.gz (533.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

curllm-1.0.34-py3-none-any.whl (606.5 kB view details)

Uploaded Python 3

File details

Details for the file curllm-1.0.34.tar.gz.

File metadata

  • Download URL: curllm-1.0.34.tar.gz
  • Upload date:
  • Size: 533.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for curllm-1.0.34.tar.gz
Algorithm Hash digest
SHA256 f838621de4e7024b7d607908767ef064852a1f50a5b2fba0ea3013e92c36a755
MD5 cd77e68658d0a462d528480487ef75f6
BLAKE2b-256 32b08fbd56ee0320d3956f911cb79aaea12d4a6edebcc141296e311c52bc96c4

See more details on using hashes here.

File details

Details for the file curllm-1.0.34-py3-none-any.whl.

File metadata

  • Download URL: curllm-1.0.34-py3-none-any.whl
  • Upload date:
  • Size: 606.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for curllm-1.0.34-py3-none-any.whl
Algorithm Hash digest
SHA256 9488f7ed66e736d20a15d995f3473f76186a82a901fe661168d07d55f9172871
MD5 bc7e4f203e7751040e0570a887281117
BLAKE2b-256 8abef45071941619c95ed4846c6931899ff318f9281794fcd17ecde043d5450b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page