Skip to main content

Browser Automation with Local LLM (8GB GPU compatible)

Project description

curllm logo

curllm = curl + LLM

Intelligent Browser Automation with Local LLMs

PyPI Python License Stars Issues

Quick StartFeaturesExamplesDocumentationAPI


🎯 What is curllm?

curllm is a powerful CLI tool that combines browser automation with local LLMs (like Ollama's Qwen, Llama, Mistral) to intelligently extract data, fill forms, and automate web workflows - all running locally on your machine with complete privacy.

# Extract products with prices from any e-commerce site
curllm "https://shop.example.com" -d "Find all products under $100"

# Fill contact forms automatically
curllm --stealth "https://example.com/contact" -d "Fill form: name=John, email=john@example.com"

# Extract all emails from a page
curllm "https://example.com" -d "extract all email addresses"

✨ Features

Feature Description
🧠 Local LLM Works with 8GB GPUs (Qwen 2.5, Llama 3, Mistral)
🎯 Smart Extraction LLM-guided DOM analysis - no hardcoded selectors
📝 Form Automation Auto-fill forms with intelligent field mapping
🥷 Stealth Mode Bypass anti-bot detection
👁️ Visual Mode See browser actions in real-time
🔍 BQL Support Browser Query Language for structured queries
📊 Export Formats JSON, CSV, HTML, XLS output
🔒 Privacy-First Everything runs locally - no cloud APIs needed

🚀 Quick Start

Installation

pip install -U curllm
curllm-setup      # One-time setup (installs Playwright browsers)
curllm-doctor     # Verify installation

Requirements

  • Python 3.10+
  • GPU: NVIDIA with 6-8GB VRAM (RTX 3060/4060) or CPU mode
  • Ollama: For local LLM inference
# Install Ollama (if not installed)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5:7b

📖 Examples

Extract Data

# Extract all links
curllm "https://example.com" -d "extract all links"

# Extract emails
curllm "https://example.com/contact" -d "extract all email addresses"
# Output: {"emails": ["info@example.com", "sales@example.com"]}

# Extract products with price filter
curllm --stealth "https://shop.example.com" -d "Find all products under 500zł"

Form Automation

# Fill contact form
curllm --visual --stealth "https://example.com/contact" \
  -d "Fill form: name=John Doe, email=john@example.com, message=Hello"

# Login automation
curllm --visual "https://app.example.com/login" \
  -d '{"instruction":"Login", "credentials":{"user":"admin", "pass":"secret"}}'

Export Results

# Export to CSV
curllm "https://example.com" -d "extract all products" --csv -o products.csv

# Export to HTML
curllm "https://example.com" -d "extract all links" --html -o links.html

# Export to Excel
curllm "https://example.com" -d "extract all data" --xls -o data.xlsx

Screenshots

# Take screenshot
curllm "https://example.com" -d "screenshot"

# Visual mode (watch browser)
curllm --visual "https://example.com" -d "extract all links"

BQL Queries

curllm --bql -d 'query {
  page(url: "https://news.ycombinator.com") {
    title
    links: select(css: "a.titlelink") { text url: attr(name: "href") }
  }
}'

🌐 Web Interface

curllm-web start   # Start web UI at http://localhost:5000
curllm-web status  # Check status
curllm-web stop    # Stop server

Features:

  • 🎨 Modern responsive UI
  • 📝 19 pre-configured prompts
  • 📊 Real-time log viewer
  • 📤 File upload support

🔧 Configuration

Environment variables (.env):

CURLLM_MODEL=qwen2.5:7b          # LLM model
CURLLM_OLLAMA_HOST=http://localhost:11434
CURLLM_HEADLESS=true             # Run browser headlessly
CURLLM_STEALTH_MODE=false        # Anti-detection
CURLLM_LOCALE=en-US              # Browser locale

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         curllm CLI                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌────────────────┐    ┌────────────────┐    ┌───────────────┐  │
│  │  DSL Executor  │───▶│ Knowledge Base │───▶│ Strategy YAML │  │
│  │  (Orchestrator)│    │   (SQLite)     │    │    Files      │  │
│  └────────────────┘    └────────────────┘    └───────────────┘  │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                    DOM Toolkit (Pure JS)                   │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐  │ │
│  │  │Structure │  │ Patterns │  │Selectors │  │   Prices   │  │ │
│  │  │ Analyzer │  │ Detector │  │Generator │  │  Detector  │  │ │
│  │  └──────────┘  └──────────┘  └──────────┘  └────────────┘  │ │
│  └────────────────────────────────────────────────────────────┘ │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │              Playwright Browser Engine                     │ │
│  │         (Chromium with Stealth & Anti-Detection)           │ │
│  └────────────────────────────────────────────────────────────┘ │
│          │                                                      │
│          ▼                                                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                 Ollama / LiteLLM                           │ │
│  │      (Local LLM: Qwen 2.5, Llama 3, Mistral, GPT, etc)     │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Key Components

Component Description LLM Calls
DSL Executor Orchestrates extraction with fallback algorithms 1-3
Knowledge Base Tracks algorithm success per domain (SQLite) 0
DOM Toolkit Pure JavaScript atomic queries 0
Strategy Files Reusable YAML extraction recipes 0
Result Validator Validates output + optional LLM check 0-1

📖 Full Architecture Documentation →

🧬 DSL System (Strategy-Based Extraction)

curllm automatically learns and saves successful extraction strategies as YAML files:

# dsl/ceneo_products.yaml - Auto-generated from successful extraction
url_pattern: "*.ceneo.pl/*"
task: extract_products
algorithm: statistical_containers

selector: div.product-card
fields:
  name: h3.title
  price: span.price
  url: a[href]

metadata:
  success_rate: 0.95
  use_count: 42

How It Works

  1. First visit - DOM Toolkit finds containers, extracts data
  2. Successful - Strategy saved to dsl/*.yaml, recorded in Knowledge Base
  3. Next visit - Knowledge Base suggests best algorithm based on history
  4. Reuse - Strategy loaded from YAML, no discovery needed

Algorithms

Algorithm Best For Speed
statistical_containers Product grids ⚡ Fast
pattern_detection Lists, tables ⚡ Fast
llm_guided Complex layouts 🐢 Slower
form_fill Contact forms ⚡ Fast

📖 DSL System Documentation →

🤝 Multi-Provider LLM Support

curllm supports multiple LLM providers via LiteLLM:

from curllm_core import LLMConfig

# OpenAI
config = LLMConfig(provider="openai/gpt-4o-mini")

# Anthropic
config = LLMConfig(provider="anthropic/claude-3-haiku-20240307")

# Google Gemini
config = LLMConfig(provider="gemini/gemini-2.0-flash")

# Local Ollama (default)
config = LLMConfig(provider="ollama/qwen2.5:7b")

📚 Documentation

Getting Started

Architecture

Reference

🧪 Development

# Clone and install
git clone https://github.com/wronai/curllm.git
cd curllm
make install

# Run tests
make test

# Run with Docker
docker compose up -d

📄 License

Apache License 2.0 - see LICENSE

🙏 Acknowledgments

Built with:


⭐ Star this repo if you find it useful!

Made with ❤️ by wronai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curllm-1.0.35.tar.gz (538.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

curllm-1.0.35-py3-none-any.whl (612.3 kB view details)

Uploaded Python 3

File details

Details for the file curllm-1.0.35.tar.gz.

File metadata

  • Download URL: curllm-1.0.35.tar.gz
  • Upload date:
  • Size: 538.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for curllm-1.0.35.tar.gz
Algorithm Hash digest
SHA256 be8f9cd8c773120771c7337f165dddea811f07e5cd1b26fa0b56250e5315e210
MD5 1f42d41831b794ef568bbb961bf5c5da
BLAKE2b-256 8f9cb67d73987c909f0cb1b362771e97c36b1103312df898a7bf51688a765575

See more details on using hashes here.

File details

Details for the file curllm-1.0.35-py3-none-any.whl.

File metadata

  • Download URL: curllm-1.0.35-py3-none-any.whl
  • Upload date:
  • Size: 612.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for curllm-1.0.35-py3-none-any.whl
Algorithm Hash digest
SHA256 1fb0c29039e4e6643391c5c2091583d64b0b7763d3a98622b42c77492584fbbd
MD5 fb74aec60cb4eed646deb7899f97e821
BLAKE2b-256 e7bd2bde628210745e83673544068434f17e042206ddad9415d2ab090987c934

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page