Browser Automation with Local LLM (8GB GPU compatible)
Project description
curllm - Browser Automation with Local LLM
๐ Full Documentation | ๐ Quick Start | ๐ Examples | โ๏ธ Installation
๐ค Intelligent Browser Automation using 8GB GPU-Compatible Local LLMs
curllm combines the power of local LLMs with browser automation for intelligent web scraping, form filling, and workflow automation - all running on your local machine with complete privacy.
โจ Features
- ๐ง Local LLM Integration: Run on 8GB GPUs with models like Qwen 2.5, Mistral, or Llama
- ๐ฏ Hierarchical Planner: 87% token reduction with intelligent 3-level decision tree (docs)
- ๐ Smart Form Filling: Automated form completion with error detection and remediation (docs)
- ๐๏ธ Visual Analysis: Computer vision for CAPTCHA detection and page understanding
- ๐ฅท Stealth Mode: Advanced anti-bot detection bypass techniques
- ๐ BQL Support: Browser Query Language for structured data extraction
- ๐ Privacy-First: Everything runs locally - no data leaves your machine
- โก GPU Optimized: Quantized models for efficient inference on consumer GPUs
๐ Requirements
Minimum Hardware
- GPU: NVIDIA GPU with 6-8GB VRAM (RTX 3060, RTX 4060, etc.)
- RAM: 16GB system memory
- Storage: 10GB free space
- CPU: Modern processor (Intel i5/AMD Ryzen 5 or better)
Software
- Python 3.11+ (tested on 3.13)
- Docker (optional, for Browserless features)
- CUDA toolkit (for GPU acceleration)
๐ Documentation
โ Complete Documentation Index
Quick Links
- Installation Guide - Detailed installation instructions
- Examples & Tutorials - Practical use cases
- Hierarchical Planner - NEW! 87% token reduction
- Form Filling - NEW! Automated form completion
- API Reference - REST API endpoints
- Environment Config - Configuration guide
- Troubleshooting - Common issues
๐ Quick Start
make install
Generate Example Scripts
Generate runnable example scripts:
make examples
# Scripts created in examples/ as curllm-*.sh
# Run with: ./examples/curllm-extract-links.sh
Installing curllm dependencies...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ curllm Installation Script โ
โ Browser Automation with Local LLM โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[1/7] Checking system requirements...
โ Python 3.13.5 found
โ GPU detected: NVIDIA GeForce RTX 4060, 8188 MiB
โ Docker is installed
[2/7] Installing Ollama...
โ Ollama is already installed
...
1. Installation
# Clone the repository
git clone https://github.com/wronai/curllm.git
cd curllm
# Run automatic installer
chmod +x install.sh
./install.sh
# Or manual installation
pip install -r requirements.txt
ollama pull qwen2.5:7b
2. Start Services
Start all required services (auto-selects free ports and saves them to .env)
curllm --start-services
Check status (reads ports from .env)
curllm --status
output:
=== curllm Service Status ===
โ Ollama is running
โ curllm API is running
โ Model qwen2.5:7b is available
GPU Status:
NVIDIA GeForce RTX 4060, 1190 MiB, 8188 MiB
3. Basic Usage
# Simple extraction (ensure services are running)
curllm "https://example.com" -d "extract all links"
output:
{
"links": [
{
"href": "https://iana.org/domains/example",
"text": "Learn more"
}
]
}
Run log: ./logs/run-20251123-113145.md
Form automation with authentication
curllm -X POST --visual --stealth \
-d '{"instruction": "Login and download invoice",
"credentials": {"user": "john@example.com", "pass": "secret"}}' \
https://app.example.com
BQL query for structured data
curllm --bql -d 'query {
page(url: "https://news.ycombinator.com") {
title
links: select(css: "a.storylink, a.titlelink") { text url: attr(name: "href") }
}
}'
๐ก Usage Recipes (quick)
-
Extract all links
curllm "https://example.com" -d "extract all links"
-
Screenshot
curllm "https://example.com" -d "screenshot"
-
Export CSV/HTML/XLS
curllm "https://example.com" -d "extract all links" --csv -o links.csv curllm "https://example.com" -d "extract all links" --html -o links.html curllm "https://example.com" -d "extract all links" --xls -o links.xls
-
Public proxy rotation (Ceneo under 150 zล)
export CURLLM_PUBLIC_PROXY_LIST="https://raw.githubusercontent.com/clarketm/proxy-list/master/proxy-list-raw.txt" curllm "https://ceneo.pl" -d "Find all products under 150zล and extract names, prices and urls" \ --stealth --proxy rotate:public --csv -o products.csv
-
Registry rotation (po rejestracji przez curlx)
# przykลad rejestracji proxy: curlx register --host 203.0.113.10 --ports 3128,3129 --server http://localhost:8000 curllm "https://ceneo.pl" -d "Find all products under 150zล and extract names, prices and urls" \ --stealth --proxy rotate:registry --html -o products.html
-
Sesja (persistent cookies)
curllm --session my-site "https://example.com" -d "screenshot"
-
WordPress: utwรณrz post
curllm --session wp-s1 -d '{"wordpress_config":{"url":"https://example.wordpress.com","action":"create_post","title":"Hello","content":"Post body","status":"draft"}}'
-
BQL: Hacker News links
curllm --bql -d 'query { page(url: "https://news.ycombinator.com") { title links: select(css: "a.storylink, a.titlelink") { text url: attr(name: "href") } }}'
-
Monitoring (cron + e-mail)
# zobacz monitoring/README.md make -C monitoring setup make -C monitoring run make -C monitoring install-3h
๐ฏ Examples
For a comprehensive, curated set of examples and ready-to-run scripts, see:
- docs/EXAMPLES.md
- Generate scripts: make examples (scripts are created in examples/ as curllm-*.sh)
Playwright + BQL (Sync) Agent โ captcha/playwright_bql_framework.py
This repository now includes a simple synchronous Playwright + BQL agent you can run directly, with built-in cookie-consent handling and CAPTCHA detection (no bypass). The agent expects your LLM to return a JSON array of BQL actions (fill, click, wait, select, submit, scroll, screenshot).
Install prerequisites (inside your virtualenv):
pip install -r requirements.txt
pip install playwright
python -m playwright install
Run a demo:
python captcha/playwright_bql_framework.py
Use your preferred LLM:
- Default: Ollama (env: CURLLM_OLLAMA_HOST, CURLLM_MODEL)
- OpenAI: set BQL_FRAMEWORK_LLM=openai and OPENAI_API_KEY, optionally OPENAI_MODEL
Examples (pseudo-code snippets):
from playwright.sync_api import sync_playwright
from captcha.playwright_bql_framework import BQLAgent, select_llm_caller
# WordPress Login
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.prototypowanie.pl/wp-login.php", wait_until="networkidle")
agent = BQLAgent(page, call_llm=select_llm_caller())
res = agent.run_instruction("Zaloguj siฤ do WordPress. Login: admin, Hasลo: test123.")
print(res)
browser.close()
# Contact form fill
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://softreck.com/contact", wait_until="networkidle")
agent = BQLAgent(page, call_llm=select_llm_caller())
res = agent.run_instruction("Wypeลnij formularz: Imiฤ Jan, Email jan@example.com, Wiadomoลฤ 'Test wysyลki'. Wyลlij formularz.")
print(res)
browser.close()
# Structured product search (example site)
with sync_playwright() as pw:
browser = pw.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://ceneo.pl", wait_until="networkidle")
agent = BQLAgent(page, call_llm=select_llm_caller())
res = agent.run_instruction("Znajdลบ wszystkie produkty poniลผej 150 zล i zwrรณฤ nazwy, ceny i URL-e.")
print(res)
browser.close()
Notes:
- The agent detects CAPTCHA-like widgets and returns an interrupt; the core curllm executor can optionally solve widget CAPTCHAs using a 2captcha sitekey token injection if you enable captcha_solver and provide an API key.
- The agent clicks obvious cookie-consent buttons if found (configurable).
product extractor from ceneo.pl
Find all products under 150zล and extract names, prices and urls
command:
curllm --visual -H "Accept-Language: pl-PL,pl;q=0.9" "https://ceneo.pl" -d '{
"instruction":"Find all products under 150zล and extract names, prices and urls",
"params": {
"include_dom_html": true,
"no_click": true,
"scroll_load": true,
"action_timeout_ms": 120000,
"use_external_slider_solver": true
}
}'
output:
{
"result": {
"products": [
{
"name": "Bestseller\n4,9\n414\n\nIbuvit D3 4000Iu 150kaps.\n\n1000+ kupionych ostatnio\nod41,18z\u0142",
"price": 41.18,
"url": "https://redirect.ceneo.pl/offers/164000259/9026?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtD%2BafUImi8vsfWXXtr8DMq0X88TaxMLTJxeJJJaUCdmGyCBcPvOgXaqbCITGHMyX4V962NVaZY%2Bh2BDFZdla0ceoMYEHuJZOUGcft4g2WWpqA%3D%3D"
},
{
"name": "Bestseller\n4,1\n4\n\nPucio urz\u0105dza wigili\u0119, czyli \u015bwi\u0105teczne s\u0142owa i zadania dla przedszkolak\u00f3w\n\n1000+ kupionych ostatnio\nod59,99z\u0142",
"price": 59.99,
"url": "https://redirect.ceneo.pl/offers/188774853/22637?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtCwjwfoZvABCMIa%2FwfTTUpPf4CAMmDEyEScec%2B8scvyCzN7ApkOegh4hO7WcMD5rND9Me7F6rLgCr57%2FozD%2BnbXTw2P4a2bsfGGmyuUb5%2B3hg%3D%3D"
},
{
"name": "Popularny teraz\n4,9\n145\n\nPOLECANY Redmi Note 14 Pro 5G 8/256GB Czarny\n\n100+ kupionych ostatnio\nod1 149,00z\u0142",
"price": 149,
"url": "https://redirect.ceneo.pl/offers/179314788/16202?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtAvV6%2BS4wDNiNcN5EmHT8vDqqLH2IKSXS8KzGnDtg%2FbqsDHDbhVHROeNVvRqlGBtTeOdxI29gon%2BiwiDIya1tRUIgi%2BTBEfSGLmPlqqNWLwhYlxKTfZiD4Rmu5c76aN5UA%3D"
},
{
"name": "Bestseller\n4,9\n290\n\nMagne B6 Forte 180tabl.\n\n1000+ kupionych ostatnio\nod51,39z\u0142",
"price": 51.39,
"url": "https://redirect.ceneo.pl/offers/179038790/53026?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtCmPDC7YdxN%2BF7%2Fy0ISj9ExuVkXTt2NudXZZU56TcEZ2uGAFeUdeZtQYVoyLrdjVtHkaMz4diwSkaxjcRkzBlz807saI8VD%2Fvb8scsPalmrQw%3D%3D"
},
{
"name": "Wysoko oceniany\n4,8\n1256\n\nArkada TC16 Serum Kolagenowe Do Paznokci Regeneracja Sk\u00f3ry i Paznokci 11ml\n\n1000+ kupionych ostatnio\nod55,52z\u0142",
"price": 55.52,
"url": "https://redirect.ceneo.pl/offers/59277115/25070?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtAhLmvnQSQBSHnxEyLKI8qxLQ1Ov3dEovwLdmEUpc8ANFic2hPQTFmMasy%2BMujUksXpCwuxpyFt9y19r9FK%2FoMsezPwMnrTTugS2BRrRFxiuA%3D%3D"
},
{
"name": "Wysoko oceniany\n4,8\n3677\n\nCalperos 1000mg 100 kaps.\n\n1000+ kupionych ostatnio\nod58,79z\u0142",
"price": 58.79,
"url": "https://redirect.ceneo.pl/offers/4775603/53026?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtBSU6%2B7XF%2F5RnZJSpyYGBIKDR1obcne7UAlbggOIO%2BGj9n2ommNYSmAtgJ3D%2FN81i%2FXENXjBiYZcHX8qh%2FMidtTISBIy0RqMmaUoHscKQ9TQg%3D%3D"
},
{
"name": "Superceny do 100z\u0142\nWi\u0119cej supercen do 100z\u0142 \u2794",
"price": 100,
"url": "https://www.ceneo.pl/;n100;discount.htm#tag=insp-superceny-gfx"
}
]
},
"run_log": "logs/run-20251124-082625.md",
"screenshots": [],
"steps_taken": 0,
"success": true,
"timestamp": "2025-11-24T08:27:36.528472"
}
Find all products under 50zล and extract names, prices and urls
command:
curllm --visual -H "Accept-Language: pl-PL,pl;q=0.9" "https://ceneo.pl" -d '{
"instruction":"Find all products under 50zล and extract names, prices and urls"
}'
output:
{
"result": {
"products": [
{
"name": "Bestseller\n4,9\n414\n\nIbuvit D3 4000Iu 150kaps.\n\n1000+ kupionych ostatnio\nod41,18z\u0142",
"price": 41.18,
"url": "https://redirect.ceneo.pl/offers/164000259/9026?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtD%2BafUImi8vsfWXXtr8DMq0X88TaxMLTJxeJJJaUCdmGyCBcPvOgXaqbCITGHMyX4V962NVaZY%2Bh2BDFZdla0ceoMYEHuJZOUGcft4g2WWpqA%3D%3D"
}
]
},
"run_log": "logs/run-20251124-082824.md",
"screenshots": [
"screenshots/www.ceneo.pl/step_0_1763969385.938158.png"
],
"steps_taken": 1,
"success": true,
"timestamp": "2025-11-24T08:29:48.844432"
}
CSV
curllm "https://ceneo.pl" -d "Find all products under 150zล and extract names, prices and urls" --csv -o products.csv
output:
{"hints":[],"result":{"products":[{"name":"Bestseller\n4,6\n11\n\nSamsung Galaxy S25 FE 8/256GB Granatowy\n\n50+ kupionych ostatnio\nod3 099,00z\u0142","price":99,"url":"https://redirect.ceneo.pl/offers/187704844/16202?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtAuOtYKLbGQbrXSM4NmcY%2BRz6ALdXwLYgWshZKWkyTjn3XyjS3rJCH84VGyVBmSHkswTiDjQqmQayxvwqGRLSOJnBq6ot4ekF%2F%2BEsNayEBatIkNShIeXEzld0Ey6kuXAEM%3D"},{"name":"Wysoko oceniany\n4,9\n70\n\nSamsung Galaxy S24 FE 5G SM-S721 8/256GB Grafitowy\n\n100+ kupionych ostatnio\nod2 099,00z\u0142","price":99,"url":"https://redirect.ceneo.pl/offers/173803420/4614?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtBYUVKV5i3hyOO53nSTTXI879XpVF9JgtGZMVu8tB6dFEoljbAuHBlv8RiqctyP6FXFyMjZUqCYsEMrtqeHA1USJ1Yi4aDrGpaNOwvkZxAmAG39DS2I2EkFzNffhrBUS%2Fo%3D"},{"name":"Bestseller\n4,1\n4\n\nPucio urz\u0105dza wigili\u0119, czyli \u015bwi\u0105teczne s\u0142owa i zadania dla przedszkolak\u00f3w\n\n1000+ kupionych ostatnio\nod59,99z\u0142","price":59.99,"url":"https://redirect.ceneo.pl/offers/188774853/22637?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtCwjwfoZvABCMIa%2FwfTTUpPn2TGG1oHS0v7VgJyOuKc9raN6uQ6jBgR%2FPPW8ERW6J6QjUMmEeHb%2BNtDGsfshcdH70cBYtw1D2SE%2FN0F5zfVhg%3D%3D"},{"name":"Bestseller\n4,9\n73\n\nNiacynobaza 25% Skoncentrowane serum z niacynamidem 30g\n\n1000+ kupionych ostatnio\nod21,79z\u0142","price":21.79,"url":"https://redirect.ceneo.pl/offers/182039369/9026?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtBdxjmWJ83EL%2FoWyjTi%2BWLsUME0Fg5%2FYHYj0LgQ%2BkjesBf55uhlTbj8S%2BHLlYURuseZ97XU%2BzkPDEyOpwWqmUKHILflawW7xGGsXiYRQnaBGA%3D%3D"},{"name":"Bestseller\n4,9\n18\n\nApple AirPods Pro 3\n\n500+ kupionych ostatnio\nod1 039,00z\u0142","price":39,"url":"https://redirect.ceneo.pl/offers/187595793/24403?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtAkD2hcmhNvx6EXlTCEYiO4k9XIBhExoCKlfHvs3zr2WFCGCs6TRUhrGe0Cw6AswYDDrsb1yxU%2F4OhoFhdqcMg9MLbGyIfKcrwsvZ0As7BdoNgzYl2rMwwdwB3bFPb8f58%3D"},{"name":"Bestseller\n4,9\n290\n\nMagne B6 Forte 180tabl.\n\n1000+ kupionych ostatnio\nod50,98z\u0142","price":50.98,"url":"https://redirect.ceneo.pl/offers/179038790/9026?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtDGOm1atwU81LhBgOjKTKO7NpOiWFioy7ean1a4wxT9kE3EWRUXGOyVbr3qk0xP649ruexm0e%2FSQ5EvODmj%2FH2sVWCqRVtnidxJykYq6fSjIg%3D%3D"},{"name":"4,9\n2304\n\nSamsung Galaxy S24 SM-S921 8/256GB Czarny\n\n100+ kupionych ostatnio\nod3 111,96z\u0142","price":111.96,"url":"https://redirect.ceneo.pl/offers/163090033/10772?e=EOpjbVPvmeU84eOvTVLW8x9ZNVMTIBt6BegLfpT4JtBdtqKfjUTK3TO2nvgxSCzNmsIXa5uveW0C%2BYGXclFTh7oeF0wt1XREvis059kiKl4MxfdWMfHPydpQRXZ58z4%2F56oCukbzagJ66HKzONrOZNifhOrShgpxPL0mh37ZobU%3D"},{"name":"Superceny do 100z\u0142\nWi\u0119cej supercen do 100z\u0142 \u2794","price":100,"url":"https://www.ceneo.pl/;n100;discount.htm#tag=insp-superceny-gfx"}]},"run_log":"logs/run-20251124-093441.md","screenshots":[],"steps_taken":0,"success":true,"suggested_commands":[],"timestamp":"2025-11-24T09:35:51.950884"}
โ CSV exported to products.csv
Examples .env and autoload
- The examples directory includes
.env.examplesand.env(generate or update withexamples/setup_env.sh). - Shell scripts in
examples/curl_*.shnow auto-loadexamples/.env(or project.env) at runtime. - Node.js and PHP examples read
examples/.envautomatically. - If your API picks a non-default port (e.g., 8002), ensure
CURLLM_API_HOSTis set in.env(handled bycurllm --start-services).
Quick setup:
chmod +x examples/setup_env.sh
examples/setup_env.sh
# Optional: export CAPTCHA 2captcha key for widget solving in core curllm
export CAPTCHA_API_KEY=YOUR_2CAPTCHA_KEY
# Run any example
examples/curl_product_search.sh
If you saw { "detail": "Not Found" }, you likely hit the wrong port. Fix by either:
# 1) Let scripts auto-load the updated host from .env (recommended)
curllm --start-services # updates .env with the actual port
examples/curl_product_search.sh
# 2) Or export the API host manually
export CURLLM_API_HOST=http://localhost:8002
examples/curl_product_search.sh
Node.js / PHP API examples
- Node.js:
examples/node_api_example.js
node examples/node_api_example.js
# reads examples/.env, posts to ${CURLLM_API_HOST}/api/execute
- PHP:
examples/php_api_example.php
php examples/php_api_example.php
# reads examples/.env, posts to ${CURLLM_API_HOST}/api/execute
CAPTCHA solver
- Core curllm can optionally solve widget CAPTCHAs (sitekey-token via 2captcha). Set env and use
--captcha:
export CAPTCHA_API_KEY=YOUR_2CAPTCHA_KEY
curllm --visual --captcha "https://example.com" -d "fill form"
Docker devbox (venv) for testing installation and examples
A lightweight container to test install and examples without touching host.
# Build and start devbox + Ollama + API (optional)
docker compose up -d devbox ollama curllm-api
# Enter the devbox
docker compose exec devbox bash
# Inside devbox: create a venv and install deps
python3 -m venv venv
source venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
python -m pip install playwright && python -m playwright install chromium
# (optional, Linux) system deps for browsers
python -m playwright install-deps chromium || true
# Prepare examples env and point to services via docker network
examples/setup_env.sh
export CURLLM_API_HOST=http://curllm-api:8000
export CURLLM_OLLAMA_HOST=http://ollama:11434
# Run examples
bash examples/curl_product_search.sh
python examples/bql_product_search.py
Notes:
- For Accept-Language, set
ACCEPT_LANGUAGEin examples/.env; shell scripts send it as a header automatically. - The Playwright+BQL Python examples do not auto-load
.env; load it viasource examples/.envbeforepython ...if needed.
Validated examples (tested)
- Extract links (basic)
curllm "https://example.com" -d "extract all links"
Expected output (truncated):
{
"links": [
{ "href": "https://iana.org/domains/example", "text": "Learn more" }
]
}
- Extract links (Polish site)
curllm "https://www.prototypowanie.pl/kontakt/" -d "extract all links"
- Extract emails
curllm "https://www.prototypowanie.pl/kontakt/" -d "extract all email addresses"
output:
{
"emails": [
"info@prototypowanie.pl"
]
}
- Extract emails
curllm "https://4coils.eu" -d "extract all email addresses"
output:
{
"emails": [
"office@4coils.eu",
"sales@4coils.eu"
]
}
- Visual mode / Stealth mode
curllm --visual "https://example.com" -d "extract all links"
curllm --stealth "https://example.com" -d "extract all links"
curllm --visual --stealth "https://example.com" -d "extract all email addresses"
Notes:
- Results and step logs are saved to files in
./logs/run-*.md(path is printed in CLI output asrun_log). - Ports and hosts are auto-managed; run
curllm --start-servicesonce, thencurllm --status. - By default, the server uses a lightweight Ollama HTTP backend. To switch to LangChain's
langchain_ollama, setCURLLM_LLM_BACKEND=langchainand ensurelangchain-ollamais installed.
Extract Data from Dynamic Pages
curllm --visual "https://allegro.com" \
-d "Find all products under 150 and extract names, prices and urls"
Create screenshot in folder name of domain
command:
curllm "https://www.prototypowanie.pl" -d "Create screenshot in folder name of domain"
output:
{"result":{"screenshot_saved":"screenshots/www.prototypowanie.pl/step_0_1763903516.803199.png"},"run_log":"logs/run-20251123-141151.md","screenshots":["screenshots/www.prototypowanie.pl/step_0_1763903516.803199.png"],"steps_taken":0,"success":true,"timestamp":"2025-11-23T14:11:57.025193"}
screenshot:
Handle 2FA Authentication
curllm --visual --captcha \
-d '{"task": "login", "username": "user@example.com",
"password": "pass", "2fa_code": "123456"}' \
https://secure-app.com
Automated Form Filling with Honeypot Detection
curllm --stealth --visual \
-d "Fill contact form: name=John Doe, email=john@example.com, message=Hello" \
https://www.prototypowanie.pl/kontakt/
Extract only email and phone links
curllm "https://www.prototypowanie.pl/kontakt/" -d "extract only email and phone links"
output:
{
"emails": ["info@prototypowanie.pl"],
"phones": ["+48503503761"]
}
Run log: ./logs/run-YYYYMMDD-HHMMSS.md
Extract all links
curllm "https://www.prototypowanie.pl/kontakt/" -d "extract all links"
output:
{
"links": [
{
"href": "https://www.prototypowanie.pl/kontakt/#content",
"text": "Skip to content"
},
{
"href": "https://www.prototypowanie.pl/",
"text": "PROTOTYPOWANIE.PL"
},
{
"href": "https://www.prototypowanie.pl/blog/",
"text": "BLOG"
},
{
"href": "https://www.prototypowanie.pl/",
"text": "WYCENA"
},
{
"href": "https://www.prototypowanie.pl/technologie/",
"text": "TECHNOLOGIE"
},
{
"href": "https://www.prototypowanie.pl/portfolio-open-source/",
"text": "PORTFOLIO"
},
{
"href": "https://www.prototypowanie.pl/marka/ondayrun/",
"text": "USลUGI"
},
{
"href": "https://www.prototypowanie.pl/kontakt/",
"text": "KONTAKT"
},
{
"href": "https://www.prototypowanie.pl/blog/",
"text": "blog"
},
{
"href": "https://www.prototypowanie.pl/co-napisac-w-formularzu-zlecenia-praktyczny-przewodnik/",
"text": "Co napisaฤ w formularzu zlecenia?"
},
{
"href": "https://www.prototypowanie.pl/uslugi/",
"text": "Do usลug"
},
{
"href": "https://www.prototypowanie.pl/faq-wszystko-o-wspolpracy-z-prototypowanie-pl/",
"text": "Jak zaczฤ
ฤ z Prototypowanie?pl"
},
{
"href": "https://www.prototypowanie.pl/konsultacja/",
"text": "Konsultacja"
},
{
"href": "https://www.prototypowanie.pl/kontakt/",
"text": "Kontakt"
},
{
"href": "https://www.prototypowanie.pl/polityka-prywatnosci/",
"text": "Polityka prywatnoลci"
},
{
"href": "https://www.prototypowanie.pl/polityka-prywatnosci/cookie-policy-eu/",
"text": "Cookie policy (EU)"
},
{
"href": "https://www.prototypowanie.pl/polityka-prywatnosci/privacy-policy/",
"text": "Privacy Policy"
},
{
"href": "https://www.prototypowanie.pl/polityka-prywatnosci/privacy-tools/",
"text": "Privacy Tools"
},
{
"href": "https://www.prototypowanie.pl/portfolio-open-source/",
"text": "Portfolio Open Source"
},
{
"href": "https://www.prototypowanie.pl/technologie/",
"text": "Technologie"
},
{
"href": "https://www.prototypowanie.pl/terms-conditions/",
"text": "Terms & conditions"
},
{
"href": "https://www.prototypowanie.pl/tomasz-sapletta/",
"text": "Tomasz Sapletta"
},
{
"href": "https://www.prototypowanie.pl/",
"text": "Twoje oprogramowanie gotowe w 24h?"
},
{
"href": "https://www.prototypowanie.pl/wycena/",
"text": "Wycena"
},
{
"href": "mailto:info@prototypowanie.pl",
"text": "info@prototypowanie.pl"
},
{
"href": "tel:48503503761",
"text": "+48 503 503 761"
},
{
"href": "https://www.linkedin.com/company/prototypowanie-pl/",
"text": "Linkedin"
},
{
"href": "https://www.prototypowanie.pl/",
"text": "rototypowanie.pl"
},
{
"href": "https://wordpress.org/plugins/gdpr-cookie-compliance/",
"text": "Powered byย Zgodnoลci ciasteczek z RODO"
}
]
}
Run log: logs/run-20251123-115654.md
Complex Workflow Automation
curllm -X POST --visual --stealth --captcha \
-d '{
"workflow": [
{"action": "navigate", "url": "https://portal.example.com"},
{"action": "login", "username": "user", "password": "pass"},
{"action": "click", "text": "Reports"},
{"action": "download", "pattern": "*.pdf"},
{"action": "extract_table", "format": "csv"}
]
}'
๐ง Configuration
Environment Variables (.env)
# The installer creates .env (from .env.example). Key variables:
# Ports and hosts (auto-maintained when starting services)
CURLLM_API_PORT=8000
CURLLM_API_HOST=http://localhost:8000
CURLLM_OLLAMA_PORT=11434
CURLLM_OLLAMA_HOST=http://localhost:11434
# Model and runtime
CURLLM_MODEL=qwen2.5:7b
CURLLM_MAX_STEPS=20
CURLLM_NUM_CTX=8192
CURLLM_NUM_PREDICT=512
CURLLM_TEMPERATURE=0.3
CURLLM_TOP_P=0.9
CURLLM_DEBUG=false
# Browserless (optional)
CURLLM_BROWSERLESS=false
BROWSERLESS_URL=ws://localhost:3000
BROWSERLESS_PORT=3000
REDIS_PORT=6379
# CAPTCHA (optional)
CAPTCHA_API_KEY=
Configuration File
Edit ~/.config/curllm/config.yml:
# Model settings
model: qwen2.5:7b
ollama_host: http://localhost:11434
temperature: 0.3
top_p: 0.9
# Browser settings
max_steps: 20
screenshot_dir: ./screenshots
headless: true
# Features
visual_mode: false
stealth_mode: false
captcha_solver: false
use_bql: false
# Performance
num_ctx: 8192
num_predict: 512
gpu_layers: 35
๐ณ Docker Deployment
Using Docker Compose
# Start all services
docker-compose up -d
# Scale browserless instances
docker-compose up -d --scale browserless=3
# View logs
docker-compose logs -f curllm-api
Standalone Docker
# Build image
docker build -t curllm:latest .
# Run container
docker run -d \
--name curllm \
--gpus all \
-p 8000:8000 \
-v ~/.ollama:/root/.ollama \
curllm:latest
๐ฎ Advanced Features
Visual Mode
Visual mode enables screenshot analysis for:
- CAPTCHA detection
- Dynamic content verification
- Visual element interaction
- Honeypot field detection
curllm --visual "https://example.com" -d "Click the red button"
Stealth Mode
Bypasses common bot detection:
- Removes automation indicators
- Randomizes behavior patterns
- Mimics human interactions
- Custom user agents and headers
curllm --stealth "https://pypi.org/project/curllm/" -d "Extract data"
Proxy rotation and sessions
- curllm can use proxies per request and rotate them automatically.
- You can pass proxy config via the API
proxyfield, and persist logins viasession_id(cookies saved under ./workspace/sessions/<session_id>.json).
Examples (API):
# 1) Rotate through a provided list (round-robin per host)
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"url": "https://example.com",
"data": "extract all links",
"proxy": {"rotate": true, "list": ["http://p1:8080","http://p2:8080","http://p3:8080"]},
"session_id": "mysession"
}'
# 2) Rotate using a file of proxies (one per line)
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"url": "https://example.com",
"data": "extract all links",
"proxy": {"rotate": true, "file": "./workspace/proxy/public_proxies.txt"}
}'
# 3) Use public proxy list via URL or env (CURLLM_PUBLIC_PROXY_LIST)
export CURLLM_PUBLIC_PROXY_LIST="https://myhost/proxies.txt" # or file:///abs/path/list.txt or comma list
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"url": "https://example.com",
"data": "extract all links",
"proxy": {"rotate": "public"}
}'
# 4) Single static proxy (string or dict)
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"url": "https://example.com",
"data": "extract all links",
"proxy": "http://user:pass@proxy.example.com:8080"
}'
Notes:
- Rotation is stored per-target host (round-robin). State is saved in
./workspace/proxy/rotation_state.json. - With
session_id, cookies persist across requests. Use the samesession_idto keep you logged in.
WordPress automation in one session (no proxy):
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"wordpress_config": {
"url": "https://example.wordpress.com",
"username": "admin",
"password": "secret123",
"action": "create_post",
"title": "Nowy artykuล",
"content": "# Tytuล\n\nTreลฤ...",
"status": "publish",
"categories": ["Technologia"],
"tags": ["AI","Automation"]
},
"session_id": "wp-mysession"
}'
BQL (Browser Query Language)
GraphQL-like syntax for structured extraction:
query {
page(url: "https://example.com") {
title
meta: select(css: "meta[property^='og:']") {
property: attr(name: "property")
content: attr(name: "content")
}
links: select(css: "a[href^='http']") {
text
url: attr(name: "href")
}
}
}
๐ Performance Benchmarks
| Model | VRAM Usage | Inference Speed | Tool-calling F1 | Avg Response Time |
|---|---|---|---|---|
| Qwen 2.5 7B | 6.8GB | 40 tok/sec | 93.3% | 8-12 sec |
| Mistral 7B | 6.5GB | 45 tok/sec | 89.1% | 7-10 sec |
| Llama 3.2 8B | 7.2GB | 35 tok/sec | 87.5% | 10-15 sec |
| Phi-3 Mini | 3.8GB | 60 tok/sec | 82.3% | 5-8 sec |
๐ ๏ธ API Reference
REST Endpoints
POST /api/execute
Content-Type: application/json
{
"url": "https://example.com",
"data": "instruction or query",
"visual_mode": true,
"stealth_mode": false,
"captcha_solver": false,
"use_bql": false,
"proxy": "http://user:pass@proxy:8080" | {"server":"...","username":"...","password":"..."} | {"rotate":true, "list":["http://..."], "file":"path"} | {"rotate":"public"},
"session_id": "my-session-id",
"wordpress_config": {"url":"https://...","username":"...","password":"...","action":"create_post", "title":"...", "content":"...", "status":"draft|publish", "categories":["..."], "tags":["..."]}
}
CLI flag --proxy (planned): pass the same JSON (or shorthand like rotate:public). Until then, use the API proxy field as above or set CURLLM_PUBLIC_PROXY_LIST.
Python Client
from curllm import CurllmClient
client = CurllmClient(
model="qwen2.5:7b",
visual_mode=True
)
result = await client.execute(
url="https://example.com",
instruction="Extract all product prices"
)
print(result.data)
๐ Troubleshooting
Common Issues
Out of Memory (OOM)
# Reduce context length
export CURLLM_NUM_CTX=4096
# Use smaller model
ollama pull phi3:mini
Slow Response
# Check GPU utilization
nvidia-smi
# Use quantized model
ollama pull qwen2.5:7b-q4_K_M
CAPTCHA Detection Issues
# Enable visual mode
curllm --visual --captcha ...
# Increase screenshot quality
export SCREENSHOT_QUALITY=100
curlx: Proxy companion
curlx to osobna paczka Python wspierajฤ ca curllm w zarzฤ dzaniu proxy (rejestracja, lista, uruchamianie serwerรณw proxy na zdalnych hostach przez SSH).
Instalacja (dev):
pip install -e ./curlx_pkg
Uลผycie:
# Rejestracja istniejฤ
cych proxy (host:port) w rejestrze curllm
curlx register --host 203.0.113.10 --ports 3128,3129 --server http://localhost:8000
# Lista zarejestrowanych proxy
curlx list --server http://localhost:8000
# Uruchomienie proxy.py na zdalnym hosฬcie przez SSH i rejestracja w curllm
curlx spawn --host ubuntu@203.0.113.10 --ports 3128,3129 --server http://localhost:8000
Integracja z curllm (rotacja z rejestru):
curllm --proxy rotate:registry "https://example.com" -d "extract links"
Zmienne ลrodowiskowe:
- CURLLM_API_HOST โ domyลlny host API curllm (np. http://localhost:8000)
- SSH_BIN โ polecenie SSH (domyลlnie: ssh)
- PY_BIN_REMOTE โ Python na hoลcie zdalnym (domyลlnie: python3)
Proxy health-check i pruning
Sprawdลบ dziaลanie proxy i usuล niedziaลajฤ ce wpisy z rejestru:
# Sprawdzenie (bez usuwania)
curl -s -X POST "$CURLLM_API_HOST/api/proxy/health" -H 'Content-Type: application/json' \
-d '{"url":"http://example.com","timeout":4,"limit":20,"prune":false}' | jq .
# Auto-pruning (usuwanie martwych)
curl -s -X POST "$CURLLM_API_HOST/api/proxy/health" -H 'Content-Type: application/json' \
-d '{"url":"http://example.com","timeout":4,"prune":true}' | jq .
WordPress + Sesje (PL)
Utrwalaj logowanie do WordPress przy pomocy session_id i twรณrz posty:
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"wordpress_config": {
"url": "https://example.wordpress.com",
"username": "admin",
"password": "secret123",
"action": "create_post",
"title": "Nowy artykuล",
"content": "# Tytuล\n\nTreลฤ...",
"status": "publish"
},
"session_id": "wp-s1"
}'
Kolejne posty w tej samej sesji (bez ponownego logowania):
curllm --session wp-s1 -d '{"wordpress_config":{"url":"https://example.wordpress.com","action":"create_post","title":"Kolejny","content":"Treลฤ","status":"draft"}}'
WordPress + Sessions (EN)
Persist WordPress login with session_id and create posts:
curl -s -X POST "$CURLLM_API_HOST/api/execute" -H 'Content-Type: application/json' -d '{
"wordpress_config": {
"url": "https://example.wordpress.com",
"username": "admin",
"password": "secret123",
"action": "create_post",
"title": "New Post",
"content": "# Title\n\nContent...",
"status": "publish"
},
"session_id": "wp-s1"
}'
Next posts in the same session:
curllm --session wp-s1 -d '{"wordpress_config":{"url":"https://example.wordpress.com","action":"create_post","title":"Next","content":"Text","status":"draft"}}'
Publikacja curlx (PyPI)
W katalogu curlx_pkg/ znajdujฤ
siฤ cele Makefile i workflow CI do wydania curlx:
cd curlx_pkg
make release # build sdist/wheel do dist/
make publish-test # publikacja do TestPyPI (wymaga TWINE_PASSWORD token)
make publish # publikacja do PyPI
Repo zawiera takลผe workflow .github/workflows/publish-curlx.yml uruchamiany tagiem curlx-v*.
๐บ๏ธ Roadmap
- CLI
--proxyflag with rotation presets (public/list/file) -
curlxcompanion: remote proxy provisioning + registry API for curllm - Multi-agent orchestration
- Fine-tuning interface for domain-specific tasks
- WebSocket support for real-time automation
- Integration with Selenium Grid
- Voice-guided automation
- Mobile browser support
- Distributed scraping with Ray
- Custom model training pipeline
Files
tree -L 3 -I node_modules -I venv
$ tree -L 3 -I node_modules -I venv
.
โโโ bql_parser.py
โโโ CHANGELOG.md
โโโ curllm
โโโ curllm_server.py
โโโ docker-compose.yml
โโโ Dockerfile
โโโ docs
โย ย โโโ EXAMPLES.md
โโโ downloads
โโโ examples.py
โโโ install.sh
โโโ INSTRUKCJA.md
โโโ LICENSE
โโโ logs
โย ย โโโ run-20251123-141151.md
โโโ Makefile
โโโ __pycache__
โย ย โโโ curllm_server.cpython-313.pyc
โโโ pyproject.toml
โโโ QUICKSTART.sh
โโโ README.md
โโโ requirements.txt
โโโ screenshots
โย ย โโโ www.prototypowanie.pl
โย ย โโโ step_0_1763903516.803199.png
โโโ tests
โย ย โโโ e2e.sh
โโโ TODO.md
โโโ tools
โย ย โโโ generate_examples.sh
โโโ workspace
12 directories, 37 files
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/wronai/curllm.git
cd curllm
pip install -e .
pytest tests/
๐ License
Apache License - see LICENSE for details.
๐ Acknowledgments
- Ollama for local LLM serving
- Browser-Use for browser automation
- Playwright for browser control
- LangChain for LLM orchestration
- Browserless for headless browser infrastructure
๐ Support
- ๐ง Email: info@softreck.com
- ๐ฌ Discord: Join our server
- ๐ Issues: GitHub Issues
- ๐ Docs: Documentation
Built with โค๏ธ by Softreck
โญ Star us on GitHub!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file curllm-1.0.15.tar.gz.
File metadata
- Download URL: curllm-1.0.15.tar.gz
- Upload date:
- Size: 157.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
285e1637520b0f52fc2a0654bf423441108e21a99b5860d147fe3fa8c7679f1f
|
|
| MD5 |
41234359632227c63cba183f766bf04d
|
|
| BLAKE2b-256 |
22cc56abc4a3d2f7591965532541fe25f093ebd65969d155b012c5d9c810d634
|
File details
Details for the file curllm-1.0.15-py3-none-any.whl.
File metadata
- Download URL: curllm-1.0.15-py3-none-any.whl
- Upload date:
- Size: 146.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11f1dcfea7f9036bcd5e30002165916d62fb56a639b28c0072f2a71ff832b5b8
|
|
| MD5 |
2614052d272bb14f491d6ced417a9d8f
|
|
| BLAKE2b-256 |
ff570745beff510ddf1ee7352f711336d812fe959e5385ef7fd756bcae1d50a3
|