Skip to main content

A clean, reusable CAPTCHA-bypassing and scraping client

Project description

pylcaptcha

pylcaptcha is a modern, high-performance, and stealthy Python library designed to bypass modern anti-bot protections (such as Cloudflare Turnstile and Google Recaptcha V2/V3) while seamlessly integrating with high-speed HTTP clients for scraping and API automation.

By combining browser-based kinematic solvers (Playwright/Camoufox) with asynchronous HTTP clients (curl-cffi), pylcaptcha allows you to solve interactive verification challenges natively and reuse those sessions for lightweight API requests.


🚀 Features

  • Double-Stage Challenge Solver: Automatically detects and resolves Cloudflare interstitial gateways and embedded form CAPTCHA widgets.
  • Human-like Kinematics: Uses physics-driven, organic mouse movements (Bézier curves via ShyMouse) to interact with verification challenges, defeating behavioral analysis.
  • Session & State Synchronization: Automatically extracts cookies, user-agents, and dynamically generated CSRF tokens to keep lightweight HTTP sessions authenticated.
  • Asynchronous Architecture: Built from the ground up on asyncio for maximum throughput.
  • Universal & Extensible: Provides generic token extraction hooks that allow integration with any web framework (e.g., Laravel, Django).

📦 Installation

You can install pylcaptcha directly from PyPI via pip:

pip install pylcaptcha

(Ensure you have your system's compatible dependencies or virtual environments set up).

Quick Start Guide

There are two modules exported:

from pylcaptcha.captcha import Captcha
from pylcaptcha.http import BrowserHTTP

The Captcha class is raw captcha solver. To use you pass into the Camoufox page and call proper method

The BrowserHTTP class is abstraction over Captcha specifically designed for Cloudflare captchas. It allows you to bypass it once with re-using of existing token

import asyncio
from pylcaptcha.http import BrowserHTTP
async def main():
    # Initialize the engine utilizing an asynchronous context manager.
    # This guarantees that the browser and connection pools are safely turn down.
    async with BrowserHTTP() as protocol:
        
        # Step 1: Open the landing page, solve the Turnstile challenge, and sync cookies.
        # this will auto solve both turnstile on page load and embedded into page captcha
        print("Solving captcha on loading...")
        await protocol.get('https://my_cf_page', browser=True) 
        
        # Step 2: Surgically extract the CSRF token from the DOM 
        # and attach it securely to outgoing request headers.
        # NOTE: this is different between backends
        await protocol.sync_csrf_token(
            selector='meta[name="csrf-token"]',
            header_name='x-csrf-token'
        )
        
        # Step 3: Fire an authenticated query payload. 
        # Attach the single-use resolution token obtained from the browser step.
        result = await protocol.post('https://my_cf_page/api/v1/check', data={
            'query': 'human_only_query',
            'token': await protocol.get_cf_token()
        })
        
        print("Response Data:")
        print(result.text)

if __name__ == '__main__':
    asyncio.run(main())

How It Works

Passive vs Active Challenges: The library differentiates between Stage 1 (gateway interstitial blocks) and Stage 2 (embedded in-page widgets like the IMEI form check), solving them sequentially when necessary.

Token Lifecycles: - Cloudflare clearance (cf_clearance) cookies are persistent and can be reused organically for session longevity.

CAPTCHA tokens (like Turnstile challenge signatures) are strictly single-use and generated just-in-time via browser DOM interaction.

CSRF Extraction: Generic evaluators locate CSRF meta tags or hidden inputs, mapping them perfectly to headers or cookies according to target framework requirements.

Contributing

Contributions are welcome! If you find bugs, encounter new anti-bot defense mechanisms, or want to add framework integrations, please open an issue or submit a pull request. Note that gcaptcha solver is not yet production stable and i need help with AI development and solving improvements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylcaptcha-1.0.0.tar.gz (69.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylcaptcha-1.0.0-py3-none-any.whl (69.6 MB view details)

Uploaded Python 3

File details

Details for the file pylcaptcha-1.0.0.tar.gz.

File metadata

  • Download URL: pylcaptcha-1.0.0.tar.gz
  • Upload date:
  • Size: 69.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pylcaptcha-1.0.0.tar.gz
Algorithm Hash digest
SHA256 832606b9b8127a65e388c34b8021e2612d6982c283c93cf5657f2fc548f7fd50
MD5 315d8e5dffc4b24f3b3812943fd939fe
BLAKE2b-256 6e9ef6fb86877c4a756ece5f09538d3b27b77a0875145d5cda307e2d939a46b8

See more details on using hashes here.

File details

Details for the file pylcaptcha-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pylcaptcha-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 69.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pylcaptcha-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f949fa71037014484f8b353c9244964886feaf07709afdafa2ab1c4cdbcf0790
MD5 76688337b909c850fe7d2f69c1699fca
BLAKE2b-256 96e14216dc987ce6e04bf3568ed5a7e31fa6dcb2b5f0383ba025806e661d7e3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page