Skip to main content

Browser Automation with Local LLM (8GB GPU compatible)

Project description

curllm.png

curllm - Browser Automation with Local LLM

๐Ÿค– Intelligent Browser Automation using 8GB GPU-Compatible Local LLMs

curllm combines the power of local LLMs with browser automation for intelligent web scraping, form filling, and workflow automation - all running on your local machine with complete privacy.

โœจ Features

  • ๐Ÿง  Local LLM Integration: Run on 8GB GPUs with models like Qwen 2.5, Mistral, or Llama
  • ๐Ÿ‘๏ธ Visual Analysis: Computer vision for CAPTCHA detection and page understanding
  • ๐Ÿฅท Stealth Mode: Advanced anti-bot detection bypass techniques
  • ๐Ÿ” BQL Support: Browser Query Language for structured data extraction
  • ๐ŸŽฏ Smart Navigation: AI-driven page interaction and form filling
  • ๐Ÿ”’ Privacy-First: Everything runs locally - no data leaves your machine
  • โšก GPU Optimized: Quantized models for efficient inference on consumer GPUs

๐Ÿ“‹ Requirements

Minimum Hardware

  • GPU: NVIDIA GPU with 6-8GB VRAM (RTX 3060, RTX 4060, etc.)
  • RAM: 16GB system memory
  • Storage: 10GB free space
  • CPU: Modern processor (Intel i5/AMD Ryzen 5 or better)

Software

  • Python 3.11+ (tested on 3.13)
  • Docker (optional, for Browserless features)
  • CUDA toolkit (for GPU acceleration)

๐Ÿš€ Quick Start

make install

๐Ÿ“š More Documentation & Example Scripts

  • Full examples with commands and context: docs/EXAMPLES.md
  • Generate runnable scripts: make examples
    • Scripts are created in examples/ as executable files (curllm-*.sh)
    • Run with: ./examples/curllm-extract-links.sh
Installing curllm dependencies...
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘       curllm Installation Script           โ•‘
โ•‘   Browser Automation with Local LLM        โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

[1/7] Checking system requirements...
โœ“ Python 3.13.5 found
โœ“ GPU detected: NVIDIA GeForce RTX 4060, 8188 MiB
โœ“ Docker is installed

[2/7] Installing Ollama...
โœ“ Ollama is already installed

...

1. Installation

# Clone the repository
git clone https://github.com/wronai/curllm.git
cd curllm

# Run automatic installer
chmod +x install.sh
./install.sh

# Or manual installation
pip install -r requirements.txt
ollama pull qwen2.5:7b

2. Start Services

Start all required services (auto-selects free ports and saves them to .env)

curllm --start-services

Check status (reads ports from .env)

curllm --status

output:

=== curllm Service Status ===
โœ“ Ollama is running
โœ“ curllm API is running
โœ“ Model qwen2.5:7b is available

GPU Status:
NVIDIA GeForce RTX 4060, 1190 MiB, 8188 MiB

3. Basic Usage

# Simple extraction (ensure services are running)
curllm "https://example.com" -d "extract all links"

output:

{
  "links": [
    {
      "href": "https://iana.org/domains/example",
      "text": "Learn more"
    }
  ]
}
Run log: ./logs/run-20251123-113145.md

Form automation with authentication

curllm -X POST --visual --stealth \
  -d '{"instruction": "Login and download invoice", 
       "credentials": {"user": "john@example.com", "pass": "secret"}}' \
  https://app.example.com

BQL query for structured data

curllm --bql -d 'query {
  page(url: "https://news.ycombinator.com") {
    title
    links: select(css: "a.storylink, a.titlelink") { text url: attr(name: "href") }
  }
}'

๐ŸŽฏ Examples

For a comprehensive, curated set of examples and ready-to-run scripts, see:

  • docs/EXAMPLES.md
  • Generate scripts: make examples (scripts are created in examples/ as curllm-*.sh)

Validated examples (tested)

  • Extract links (basic)
curllm "https://example.com" -d "extract all links"

Expected output (truncated):

{
  "links": [
    { "href": "https://iana.org/domains/example", "text": "Learn more" }
  ]
}
  • Extract links (Polish site)
curllm "https://www.prototypowanie.pl/kontakt/" -d "extract all links"
  • Extract emails
curllm "https://www.prototypowanie.pl/kontakt/" -d "extract all email addresses"

output:

{
  "emails": [
    "info@prototypowanie.pl"
  ]
}
  • Extract emails
curllm "https://4coils.eu" -d "extract all email addresses"

output:

{
  "emails": [
    "office@4coils.eu",
    "sales@4coils.eu"
  ]
}
  • Visual mode / Stealth mode
curllm --visual "https://example.com" -d "extract all links"
curllm --stealth "https://example.com" -d "extract all links"
curllm --visual --stealth "https://example.com" -d "extract all email addresses"

Notes:

  • Results and step logs are saved to files in ./logs/run-*.md (path is printed in CLI output as run_log).
  • Ports and hosts are auto-managed; run curllm --start-services once, then curllm --status.
  • By default, the server uses a lightweight Ollama HTTP backend. To switch to LangChain's langchain_ollama, set CURLLM_LLM_BACKEND=langchain and ensure langchain-ollama is installed.

Extract Data from Dynamic Pages

curllm --visual "https://allegro.com" \
  -d "Find all products under 150 and extract names, prices and urls"

Create screenshot in folder name of domain

command:

curllm "https://www.prototypowanie.pl"  -d "Create screenshot in folder name of domain"

output:

{"result":{"screenshot_saved":"screenshots/www.prototypowanie.pl/step_0_1763903516.803199.png"},"run_log":"logs/run-20251123-141151.md","screenshots":["screenshots/www.prototypowanie.pl/step_0_1763903516.803199.png"],"steps_taken":0,"success":true,"timestamp":"2025-11-23T14:11:57.025193"}

screenshot: step_0_1763903516.803199.png

Handle 2FA Authentication

curllm --visual --captcha \
  -d '{"task": "login", "username": "user@example.com", 
       "password": "pass", "2fa_code": "123456"}' \
  https://secure-app.com

Automated Form Filling with Honeypot Detection

curllm --stealth --visual \
  -d "Fill contact form: name=John Doe, email=john@example.com, message=Hello" \
  https://www.prototypowanie.pl/kontakt/

Extract only email and phone links

curllm "https://www.prototypowanie.pl/kontakt/" -d "extract only email and phone links"

output:

{
  "emails": ["info@prototypowanie.pl"],
  "phones": ["+48503503761"]
}
Run log: ./logs/run-YYYYMMDD-HHMMSS.md

Extract all links

curllm "https://www.prototypowanie.pl/kontakt/" -d "extract all links"

output:

{
  "links": [
    {
      "href": "https://www.prototypowanie.pl/kontakt/#content",
      "text": "Skip to content"
    },
    {
      "href": "https://www.prototypowanie.pl/",
      "text": "PROTOTYPOWANIE.PL"
    },
    {
      "href": "https://www.prototypowanie.pl/blog/",
      "text": "BLOG"
    },
    {
      "href": "https://www.prototypowanie.pl/",
      "text": "WYCENA"
    },
    {
      "href": "https://www.prototypowanie.pl/technologie/",
      "text": "TECHNOLOGIE"
    },
    {
      "href": "https://www.prototypowanie.pl/portfolio-open-source/",
      "text": "PORTFOLIO"
    },
    {
      "href": "https://www.prototypowanie.pl/marka/ondayrun/",
      "text": "USลUGI"
    },
    {
      "href": "https://www.prototypowanie.pl/kontakt/",
      "text": "KONTAKT"
    },
    {
      "href": "https://www.prototypowanie.pl/blog/",
      "text": "blog"
    },
    {
      "href": "https://www.prototypowanie.pl/co-napisac-w-formularzu-zlecenia-praktyczny-przewodnik/",
      "text": "Co napisaฤ‡ w formularzu zlecenia?"
    },
    {
      "href": "https://www.prototypowanie.pl/uslugi/",
      "text": "Do usล‚ug"
    },
    {
      "href": "https://www.prototypowanie.pl/faq-wszystko-o-wspolpracy-z-prototypowanie-pl/",
      "text": "Jak zaczฤ…ฤ‡ z Prototypowanie?pl"
    },
    {
      "href": "https://www.prototypowanie.pl/konsultacja/",
      "text": "Konsultacja"
    },
    {
      "href": "https://www.prototypowanie.pl/kontakt/",
      "text": "Kontakt"
    },
    {
      "href": "https://www.prototypowanie.pl/polityka-prywatnosci/",
      "text": "Polityka prywatnoล›ci"
    },
    {
      "href": "https://www.prototypowanie.pl/polityka-prywatnosci/cookie-policy-eu/",
      "text": "Cookie policy (EU)"
    },
    {
      "href": "https://www.prototypowanie.pl/polityka-prywatnosci/privacy-policy/",
      "text": "Privacy Policy"
    },
    {
      "href": "https://www.prototypowanie.pl/polityka-prywatnosci/privacy-tools/",
      "text": "Privacy Tools"
    },
    {
      "href": "https://www.prototypowanie.pl/portfolio-open-source/",
      "text": "Portfolio Open Source"
    },
    {
      "href": "https://www.prototypowanie.pl/technologie/",
      "text": "Technologie"
    },
    {
      "href": "https://www.prototypowanie.pl/terms-conditions/",
      "text": "Terms & conditions"
    },
    {
      "href": "https://www.prototypowanie.pl/tomasz-sapletta/",
      "text": "Tomasz Sapletta"
    },
    {
      "href": "https://www.prototypowanie.pl/",
      "text": "Twoje oprogramowanie gotowe w 24h?"
    },
    {
      "href": "https://www.prototypowanie.pl/wycena/",
      "text": "Wycena"
    },
    {
      "href": "mailto:info@prototypowanie.pl",
      "text": "info@prototypowanie.pl"
    },
    {
      "href": "tel:48503503761",
      "text": "+48 503 503 761"
    },
    {
      "href": "https://www.linkedin.com/company/prototypowanie-pl/",
      "text": "Linkedin"
    },
    {
      "href": "https://www.prototypowanie.pl/",
      "text": "rototypowanie.pl"
    },
    {
      "href": "https://wordpress.org/plugins/gdpr-cookie-compliance/",
      "text": "Powered byย  Zgodnoล›ci ciasteczek z RODO"
    }
  ]
}
Run log: logs/run-20251123-115654.md

Complex Workflow Automation

curllm -X POST --visual --stealth --captcha \
  -d '{
    "workflow": [
      {"action": "navigate", "url": "https://portal.example.com"},
      {"action": "login", "username": "user", "password": "pass"},
      {"action": "click", "text": "Reports"},
      {"action": "download", "pattern": "*.pdf"},
      {"action": "extract_table", "format": "csv"}
    ]
  }'

๐Ÿ”ง Configuration

Environment Variables (.env)

# The installer creates .env (from .env.example). Key variables:
# Ports and hosts (auto-maintained when starting services)
CURLLM_API_PORT=8000
CURLLM_API_HOST=http://localhost:8000
CURLLM_OLLAMA_PORT=11434
CURLLM_OLLAMA_HOST=http://localhost:11434

# Model and runtime
CURLLM_MODEL=qwen2.5:7b
CURLLM_MAX_STEPS=20
CURLLM_NUM_CTX=8192
CURLLM_NUM_PREDICT=512
CURLLM_TEMPERATURE=0.3
CURLLM_TOP_P=0.9
CURLLM_DEBUG=false

# Browserless (optional)
CURLLM_BROWSERLESS=false
BROWSERLESS_URL=ws://localhost:3000
BROWSERLESS_PORT=3000
REDIS_PORT=6379

# CAPTCHA (optional)
CAPTCHA_API_KEY=

Configuration File

Edit ~/.config/curllm/config.yml:

# Model settings
model: qwen2.5:7b
ollama_host: http://localhost:11434
temperature: 0.3
top_p: 0.9

# Browser settings
max_steps: 20
screenshot_dir: ./screenshots
headless: true

# Features
visual_mode: false
stealth_mode: false
captcha_solver: false
use_bql: false

# Performance
num_ctx: 8192
num_predict: 512
gpu_layers: 35

๐Ÿณ Docker Deployment

Using Docker Compose

# Start all services
docker-compose up -d

# Scale browserless instances
docker-compose up -d --scale browserless=3

# View logs
docker-compose logs -f curllm-api

Standalone Docker

# Build image
docker build -t curllm:latest .

# Run container
docker run -d \
  --name curllm \
  --gpus all \
  -p 8000:8000 \
  -v ~/.ollama:/root/.ollama \
  curllm:latest

๐ŸŽฎ Advanced Features

Visual Mode

Visual mode enables screenshot analysis for:

  • CAPTCHA detection
  • Dynamic content verification
  • Visual element interaction
  • Honeypot field detection
curllm --visual "https://example.com" -d "Click the red button"

Stealth Mode

Bypasses common bot detection:

  • Removes automation indicators
  • Randomizes behavior patterns
  • Mimics human interactions
  • Custom user agents and headers
curllm --stealth "https://pypi.org/project/curllm/" -d "Extract data"

BQL (Browser Query Language)

GraphQL-like syntax for structured extraction:

query {
  page(url: "https://example.com") {
    title
    meta: select(css: "meta[property^='og:']") {
      property: attr(name: "property")
      content: attr(name: "content")
    }
    links: select(css: "a[href^='http']") {
      text
      url: attr(name: "href")
    }
  }
}

๐Ÿ“Š Performance Benchmarks

Model VRAM Usage Inference Speed Tool-calling F1 Avg Response Time
Qwen 2.5 7B 6.8GB 40 tok/sec 93.3% 8-12 sec
Mistral 7B 6.5GB 45 tok/sec 89.1% 7-10 sec
Llama 3.2 8B 7.2GB 35 tok/sec 87.5% 10-15 sec
Phi-3 Mini 3.8GB 60 tok/sec 82.3% 5-8 sec

๐Ÿ› ๏ธ API Reference

REST Endpoints

POST /api/execute
Content-Type: application/json

{
  "url": "https://example.com",
  "data": "instruction or query",
  "visual_mode": true,
  "stealth_mode": false,
  "captcha_solver": false,
  "use_bql": false
}

Python Client

from curllm import CurllmClient

client = CurllmClient(
    model="qwen2.5:7b",
    visual_mode=True
)

result = await client.execute(
    url="https://example.com",
    instruction="Extract all product prices"
)

print(result.data)

๐Ÿ› Troubleshooting

Common Issues

Out of Memory (OOM)

# Reduce context length
export CURLLM_NUM_CTX=4096

# Use smaller model
ollama pull phi3:mini

Slow Response

# Check GPU utilization
nvidia-smi

# Use quantized model
ollama pull qwen2.5:7b-q4_K_M

CAPTCHA Detection Issues

# Enable visual mode
curllm --visual --captcha ...

# Increase screenshot quality
export SCREENSHOT_QUALITY=100

๐Ÿ—บ๏ธ Roadmap

  • Multi-agent orchestration
  • Fine-tuning interface for domain-specific tasks
  • WebSocket support for real-time automation
  • Integration with Selenium Grid
  • Voice-guided automation
  • Mobile browser support
  • Distributed scraping with Ray
  • Custom model training pipeline

Files

tree -L 3 -I node_modules -I venv

$ tree -L 3 -I node_modules -I venv
.
โ”œโ”€โ”€ bql_parser.py
โ”œโ”€โ”€ CHANGELOG.md
โ”œโ”€โ”€ curllm
โ”œโ”€โ”€ curllm_server.py
โ”œโ”€โ”€ docker-compose.yml
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ docs
โ”‚ย ย  โ””โ”€โ”€ EXAMPLES.md
โ”œโ”€โ”€ downloads
โ”œโ”€โ”€ examples.py
โ”œโ”€โ”€ install.sh
โ”œโ”€โ”€ INSTRUKCJA.md
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ logs
โ”‚ย ย  โ””โ”€โ”€ run-20251123-141151.md
โ”œโ”€โ”€ Makefile
โ”œโ”€โ”€ __pycache__
โ”‚ย ย  โ””โ”€โ”€ curllm_server.cpython-313.pyc
โ”œโ”€โ”€ pyproject.toml
โ”œโ”€โ”€ QUICKSTART.sh
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ screenshots
โ”‚ย ย  โ””โ”€โ”€ www.prototypowanie.pl
โ”‚ย ย      โ””โ”€โ”€ step_0_1763903516.803199.png
โ”œโ”€โ”€ tests
โ”‚ย ย  โ””โ”€โ”€ e2e.sh
โ”œโ”€โ”€ TODO.md
โ”œโ”€โ”€ tools
โ”‚ย ย  โ””โ”€โ”€ generate_examples.sh
โ””โ”€โ”€ workspace

12 directories, 37 files

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/wronai/curllm.git
cd curllm
pip install -e .
pytest tests/

๐Ÿ“„ License

Apache License - see LICENSE for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ž Support


Built with โค๏ธ by Softreck

โญ Star us on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

curllm-1.0.3.tar.gz (62.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

curllm-1.0.3-py3-none-any.whl (64.4 kB view details)

Uploaded Python 3

File details

Details for the file curllm-1.0.3.tar.gz.

File metadata

  • Download URL: curllm-1.0.3.tar.gz
  • Upload date:
  • Size: 62.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for curllm-1.0.3.tar.gz
Algorithm Hash digest
SHA256 9207857ae199d769e45afb6bc080328b097557e55b76c7c96ba0839059746900
MD5 ff64ce4f6aca8f88c5d4121c240ea180
BLAKE2b-256 1d29f3ee277342c6bc7e2087faf79475ca6afa844588ebe2e1b98c887e497592

See more details on using hashes here.

File details

Details for the file curllm-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: curllm-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 64.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for curllm-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 77a3a384c4ca937b8aa924c7c3578be05bdf1770ce730a7082ea6fe3c4fc0aa6
MD5 397f7ebada07fc10e2d2652ac052d182
BLAKE2b-256 6d3dad7383990d1e1b3ea617d571cb6b5b8d137011085b7a67e6eb9cb491f582

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page