Skip to main content

Self-hosted async captcha bypass service with HTTP API

Project description

captcha-bypass

Self-hosted async captcha bypass service with HTTP API. Tested on Cloudflare and Amazon challenges.

Current limitation: Only GET requests are supported. POST/PUT with body and custom headers planned for future releases.

What's New in 0.3.0

  • Cloudflare Turnstile support: Automatic detection and clicking of Turnstile checkbox challenges during validation polling. Only activates when challenges.cloudflare.com iframe is detected.
  • cf_clearance cookie detection: When Cloudflare is detected, the solver monitors cookies for cf_clearance as a parallel success signal. If the cookie appears, the task completes immediately — even without selector match. This provides resilience against outdated or incorrect success selectors.

Installation

Docker (recommended)

# default settings
docker-compose up -d

# with custom params
WORKERS=4 PORT=9000 RESULT_TTL=300 MAX_QUEUE_SIZE=500 docker-compose up -d

pip

pip install captcha-bypass

# run (browser is auto-downloaded on first run)
captcha-bypass

# with custom params
captcha-bypass --workers 4 --port 9000 --result-ttl 300 --max-queue-size 500

System dependencies (Linux only):

# Debian/Ubuntu
sudo apt-get install libgtk-3-0 libx11-xcb1 libasound2

# RHEL/CentOS/Fedora
sudo dnf install gtk3 libX11-xcb alsa-lib

macOS and Windows: dependencies are typically bundled with the browser.

Custom Docker Image

If you install via pip in your own Docker image, add these to avoid zombie processes from Camoufox:

docker-compose.yml:

services:
  your-service:
    init: true  # reaps zombie processes from browser
    healthcheck:
      test: ["CMD-SHELL", "curl -sf http://localhost:8191/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

Or in Dockerfile (alternative to init: true):

RUN apt-get update && apt-get install -y tini curl
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["captcha-bypass"]

Python Client

Sync
from captcha_bypass.client import CaptchaBypassClient

with CaptchaBypassClient("http://localhost:8191") as client:
    result = client.solve_and_wait(
        url="https://example.com",
        timeout=60,
        success_texts=["Welcome"],
    )

    if result.data and result.data["status"] == "completed":
        data = result.data["data"]
        cookies = data["cookies"]
        headers = data["request_headers"]
Async
import asyncio
from captcha_bypass.client import AsyncCaptchaBypassClient

async def main():
    async with AsyncCaptchaBypassClient("http://localhost:8191") as client:
        result = await client.solve_and_wait(
            url="https://example.com",
            timeout=60,
            success_selectors=["#dashboard"],
        )

        if result.data and result.data["status"] == "completed":
            data = result.data["data"]
            cookies = data["cookies"]
            headers = data["request_headers"]

asyncio.run(main())
With Proxy (Sync)
from captcha_bypass.client import CaptchaBypassClient

proxy = {
    "server": "socks5://proxy.example.com:1080",
    "username": "user",      # optional
    "password": "pass",      # optional
}

with CaptchaBypassClient("http://localhost:8191") as client:
    result = client.solve_and_wait(
        url="https://example.com",
        timeout=60,
        proxy=proxy,
        success_texts=["Welcome"],
    )
With Proxy (Async)
import asyncio
from captcha_bypass.client import AsyncCaptchaBypassClient

proxy = {
    "server": "socks5://proxy.example.com:1080",
    "username": "user",      # optional
    "password": "pass",      # optional
}

async def main():
    async with AsyncCaptchaBypassClient("http://localhost:8191") as client:
        result = await client.solve_and_wait(
            url="https://example.com",
            timeout=60,
            proxy=proxy,
            success_selectors=["#dashboard"],
        )

asyncio.run(main())

See examples/ for complete usage.

Configuration

Parameter Default Description
PORT 8191 HTTP server port
WORKERS CPU cores Number of browser workers (~500MB RAM each)
RESULT_TTL 300 Seconds to keep completed results before auto-delete
MAX_QUEUE_SIZE 1000 Maximum pending tasks in queue

API Reference

GET /health — Service status and metrics

Use for health checks and monitoring.

curl http://localhost:8191/health

Response (HTTP 200):

{
  "status": "ok",
  "workers": 4,
  "active_workers": 1,
  "queue_size": 3
}

Response Fields

Field Type Description
status string Service status. Always "ok" if server responds
workers integer Total configured workers (browser instances)
active_workers integer Workers currently processing tasks
queue_size integer Pending tasks waiting for a free worker

Notes:

  • If active_workers == workers and queue_size > 0, all workers are busy
  • If server is down, connection will fail (no response)
  • Suitable for load balancer health checks and Kubernetes probes
POST /solve — Queue a captcha bypass task

Returns immediately with task_id.

curl -X POST http://localhost:8191/solve \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/protected",
    "timeout": 60,
    "success_texts": ["Welcome"],
    "success_selectors": ["#dashboard", ".user-profile"]
  }'

Parameters

Parameter Required Type Description
url Yes string Target URL (max 2048 chars, must start with http:// or https://)
timeout Yes integer Max wait time in seconds (1-300)
success_texts No array Texts indicating successful bypass
success_selectors No array CSS/XPath selectors indicating success
proxy No object Proxy configuration

Success Conditions

The service polls the page every 2 seconds checking for success conditions. Uses OR logic — returns as soon as any condition matches.

Important: If both success_texts and success_selectors are empty or omitted, the service waits the full timeout period before returning the result. Use this when you don't know what indicates success and just need to wait for the challenge to complete.

Text matching (success_texts):

  • Searches for substring in page body text
  • Case-sensitive
  • Example: ["Welcome back", "Dashboard"]

Selector matching (success_selectors):

  • CSS selectors — standard querySelector syntax
  • XPath selectors — start with // (search anywhere) or / (absolute path from root)
  • Example: ["#main-content", ".logged-in", "//div[@data-auth='true']"]

Selector Syntax

CSS selectors (see MDN CSS Selectors):

#id                  — by ID
.class               — by class
div                  — by tag
[attr="value"]       — by attribute
div.class#id         — combined
div > p              — direct child
div p                — descendant

XPath selectors (see MDN XPath):

//div[@id="main"]           — div with id="main"
//button[text()="Submit"]   — button with exact text
//input[@type="email"]      — input with type="email"
//*[contains(@class,"btn")] — any element with "btn" in class

Proxy Configuration

{
  "proxy": {
    "server": "socks5://proxy.example.com:1080",
    "username": "user",
    "password": "pass"
  }
}
Field Required Description
server Yes Proxy URL (max 2048 chars)
username No Proxy username
password No Proxy password

Supported protocols: http://, https://, socks4://, socks5://

When proxy is configured, GeoIP-based fingerprint (timezone, language) is automatically applied.

Response

{
  "task_id": "550e8400-e29b-41d4-a716-446655440000"
}

Errors

All error responses follow this structure:

{
  "error": "<error_code>",
  "message": "<human-readable description>"
}
HTTP Status Code Description
400 invalid_json Request body is not valid JSON
400 missing_field Required field missing
400 invalid_field Field has invalid value
503 queue_full Task queue at capacity, retry later

Example error responses:

// 400 Bad Request - invalid JSON
{
  "error": "invalid_json",
  "message": "Request body must be valid JSON"
}

// 400 Bad Request - missing field
{
  "error": "missing_field",
  "message": "Field 'url' is required"
}

// 400 Bad Request - invalid field value
{
  "error": "invalid_field",
  "message": "Field 'timeout' must be a positive integer"
}

// 503 Service Unavailable - queue full
{
  "error": "queue_full",
  "message": "Task queue is full (max 1000). Try again later."
}
GET /result/{task_id} — Get task status and result

Poll this endpoint until status is completed or error.

curl http://localhost:8191/result/550e8400-e29b-41d4-a716-446655440000

Response Examples

Completed (success condition matched):

{
  "status": "completed",
  "error": null,
  "data": {
    "cookies": [
      {
        "name": "cf_clearance",
        "value": "...",
        "domain": ".example.com",
        "path": "/",
        "expires": 1234567890,
        "httpOnly": true,
        "secure": true,
        "sameSite": "None"
      }
    ],
    "request_headers": {
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0",
      "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
      "Accept-Language": "en-US,en;q=0.5",
      "Accept-Encoding": "gzip, deflate, br",
      "Sec-Fetch-Dest": "document",
      "Sec-Fetch-Mode": "navigate",
      "Sec-Fetch-Site": "none",
      "Sec-Fetch-User": "?1",
      "Upgrade-Insecure-Requests": "1"
    },
    "response_headers": {
      "content-type": "text/html; charset=utf-8",
      "set-cookie": "...",
      "cf-ray": "..."
    },
    "status_code": 200,
    "html": "<!DOCTYPE html>...",
    "url": "https://example.com/dashboard",
    "timeout_reached": false,
    "validation": {
      "matched": true,
      "match_type": "selector",
      "matched_condition": "#dashboard"
    }
  }
}

Pending/Running:

{
  "status": "pending",
  "error": null,
  "data": null
}

Error:

{
  "status": "error",
  "error": {
    "code": "browser_error",
    "message": "Timeout starting camoufox"
  },
  "data": null
}

Not Found (HTTP 200):

{
  "status": "not_found",
  "error": null,
  "data": null
}

Invalid Task ID (HTTP 400):

{
  "status": "not_found",
  "error": {
    "code": "invalid_task_id",
    "message": "Invalid task ID format"
  },
  "data": null
}

Status Values

Status Description
pending Task waiting in queue
running Browser is processing the task
completed Task finished (check data.validation.matched for success)
error Task failed (see error.code)
not_found Task doesn't exist, was deleted, or expired

Result Fields

Field Description
cookies Array of cookies from browser context
request_headers Browser request headers (User-Agent, Accept, etc.) for reuse in Python requests
response_headers Response headers from initial navigation (Set-Cookie, Content-Type, etc.)
status_code HTTP status code (may be null if navigation timed out)
html Page HTML content
url Final URL after all redirects
timeout_reached true if task waited full timeout without validation match
validation.matched true if any success condition was found
validation.match_type "text" (body text matched), "selector" (CSS/XPath element found), or "cookie" (cf_clearance cookie detected — Cloudflare challenge solved). null if not matched
validation.matched_condition The specific text or selector that matched, null if not matched

Error Codes

Code Description
invalid_task_id Task ID format is invalid (HTTP 400)
cancelled Task was cancelled via DELETE endpoint
browser_error Browser crashed or failed to start
browser_closed Browser/page closed unexpectedly
DELETE /task/{task_id} — Cancel or delete a task
curl -X DELETE http://localhost:8191/task/550e8400-e29b-41d4-a716-446655440000

Response

Success (HTTP 200):

{
  "success": true,
  "message": "Task cancelled (was pending)"
}

Invalid task ID (HTTP 400):

{
  "success": false,
  "message": "Invalid task ID"
}

Response Fields

Field Type Description
success boolean true if operation succeeded, false if task not found or invalid
message string Human-readable result description

HTTP Status Codes

Status Condition
200 Operation performed (check success field for result)
400 Invalid task ID format

Messages

Message success HTTP Description
Task cancelled (was pending) true 200 Removed from queue before processing
Task marked for cancellation true 200 Running task will stop at next check
Result deleted true 200 Completed task result removed
Task not found false 200 Task doesn't exist
Invalid task ID false 400 Task ID format validation failed

Usage Examples

Basic: Wait for text

curl -X POST http://localhost:8191/solve \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timeout": 60,
    "success_texts": ["Welcome"]
  }'

Wait for element to appear

curl -X POST http://localhost:8191/solve \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timeout": 90,
    "success_selectors": ["#content-loaded", "[data-ready=true]"]
  }'

Combined conditions (OR logic)

curl -X POST http://localhost:8191/solve \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timeout": 120,
    "success_texts": ["Dashboard", "Welcome back"],
    "success_selectors": ["#user-menu", ".authenticated"]
  }'

No conditions (wait full timeout)

Use when you don't know what indicates success. Service waits full timeout, then returns whatever state the page is in.

curl -X POST http://localhost:8191/solve \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timeout": 30
  }'

With proxy

curl -X POST http://localhost:8191/solve \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timeout": 60,
    "success_texts": ["Success"],
    "proxy": {
      "server": "socks5://proxy.example.com:1080",
      "username": "user",
      "password": "pass"
    }
  }'

Resource Usage

  • ~500MB RAM per worker
  • Recommended: 1-2 workers per CPU core

Notes

  • Browser uses stealth mode (Camoufox) with WebRTC blocking

TODO

  • SSRF protection — validate URLs against internal addresses (169.254.169.254, localhost, private IPs)
  • Difficulty levels — headless="virtual" mode selection for different challenge complexity
  • HTTP methods — support POST, PUT, DELETE with request body and custom headers
  • Browser fingerprint options — OS, locale, screen size, timezone via Camoufox config

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

captcha_bypass-0.3.0.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

captcha_bypass-0.3.0-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file captcha_bypass-0.3.0.tar.gz.

File metadata

  • Download URL: captcha_bypass-0.3.0.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for captcha_bypass-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d93a83e76a8d244edc5b74d57ccf7d3133778e8ca2d9bd4fba38c7c30b35d75f
MD5 55fee8c9b68f7660ea08eca597198da8
BLAKE2b-256 33602f3e79ee5c91845cafef31eaab40a8756f14b6d165534d46c05ba5cebfff

See more details on using hashes here.

File details

Details for the file captcha_bypass-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: captcha_bypass-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for captcha_bypass-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 891d194ed4921f9732ad03beb770045f1514d30de0301645696b4d2df654d17c
MD5 ef8dd68b5685746a899c5b3ebe9dbfd6
BLAKE2b-256 6730aaf455189a79c2187b60a91c1f10c78427f8bc09e88d1aa5e169419d46be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page