Skip to main content

A Hybrid Scraper Framework: cookie consent, auto-pagination, temp identities, session bridging, API discovery, stealth fingerprinting, and Turnstile solving.

Project description

  _____       _--_          _                   
 |  __ \     |  _ \        | |                  
 | |__) |   _| |_) |__ _ __| | ___ ___  _   _ _ __ 
 |  ___/ | | |  _ // _` | '__| |/ / _ \| | | | '__|
 | |   | |_| | | \ \ (_| | |  |   < (_) | |_| | |  
 |_|    \__, |_|  \_\__,_|_|  |_|\_\___/ \__,_|_|  
         __/ |                                     
        |___/                                      

Version: 3.0.0
Author: Zinzied (ziedboughdir@gmail.com) · GitHub

PyPI version

🏃 Py-Parkour: The Hybrid Scraper Framework

Py-Parkour is a lightweight automation utility designed to solve the biggest annoyances in modern web scraping:

  1. 🍪 Cookie Consents: Detecting and destroying GDPR/modal popups.
  2. 🧭 Pagination: Auto-detecting "Next" buttons or infinite scroll.
  3. 🎭 Verification Gates: Generating temporary identities (Email/SMS) for signups.
  4. 👻 Hybrid Scraping: Break in with the browser, then steal the session for fast API calls.
  5. 📡 API Discovery: Automatically detect hidden JSON APIs.
  6. 🔐 Stealth Mode: Browser fingerprinting and bot evasion scripts.
  7. Turnstile Solving: Built-in Cloudflare Turnstile bypass.

It turns your scraper into a workflow automation platform.


🆕 What's New in v3.0.0

  • 🎯 Gadget System: Pluggable modules via constructor
  • 🔐 Fingerprint Sync: Match browser fingerprint with TLS layer (TLS-Chameleon compatible)
  • ⚡ Context Pool: 10x faster challenge solving with reusable contexts
  • 🔄 Turnstile Auto-Solver: Built-in micro-interaction patterns (no external API needed)
  • 📤 Session Export: Export cookies, localStorage, sessionStorage for cloudscraper handoff
  • 👻 Stealth Injection: Comprehensive bot evasion scripts

📦 Installation

pip install py-parkour[full]

Or for development:

pip install -r requirements.txt
playwright install

🚀 Quick Start

Basic Usage

import asyncio
from py_parkour import ParkourBot

async def main():
    # Supports 'chromium' (default), 'firefox', or 'webkit'
    bot = ParkourBot(headless=False, browser="firefox")
    await bot.start()
    await bot.goto("https://target-website.com")
    # ... use gadgets here ...
    await bot.close()

asyncio.run(main())

With Fingerprint & Stealth (v3.0)

from py_parkour import ParkourBot, FingerprintGallery

async def main():
    # Create bot with Chrome 120 fingerprint
    bot = ParkourBot(
        headless=True,
        gadgets=['ghost', 'turnstile', 'shadow', 'crusher'],
        fingerprint=FingerprintGallery.CHROME_120_WIN11,
        stealth=True
    )
    await bot.start()
    
    # Solve Turnstile automatically
    await bot.goto("https://protected-site.com")
    await bot.solve_turnstile()
    
    # Export session for cloudscraper
    session = await bot.export_session()
    print(f"Cookies: {session['cookies']}")
    
    await bot.close()

For CloudScraper Integration

from py_parkour import ParkourBot
import cloudscraper

async def main():
    # Create bot optimized for cloudscraper
    bot = ParkourBot.for_cloudscraper(tls_profile="chrome_120_win11")
    await bot.start()
    
    # Bypass challenges with browser
    await bot.goto("https://protected-site.com")
    await bot.solve_turnstile()
    
    # Hand off to cloudscraper
    scraper = cloudscraper.create_scraper()
    await bot.import_to_cloudscraper(scraper)
    
    # Continue with fast requests
    response = scraper.get("https://protected-site.com/api/data")
    print(response.json())
    
    await bot.close()

☕ Support / Donate

If you found this library useful, buy me a coffee!

zied

License

MIT


🎯 Gadgets

🍪 Crusher (Cookie Bypasser)

Don't write brittle selectors for every "Accept Cookies" button.

await bot.crush_cookies()

🧭 Compass (Auto-Pagination)

Stop guessing if the site uses ?page=2 or a "Next >" button.

async for page_number in bot.crawl(max_pages=10):
    print(f"Scraping Page {page_number}: {bot.current_url}")

🎭 Disguises (Temp Identity)

Need to sign up to view data? Generate a burner identity.

identity = await bot.identity.generate_identity(country="US")
print(f"Using email: {identity.email}")

code = await bot.identity.wait_for_code()
await bot.driver.page.fill("#otp-input", code)

👻 Shadow (Session Bridge)

Break in with the browser, then steal the session for high-speed API calls.

# 1. Login with the browser
await bot.goto("https://target.com/login")
# ... do login stuff ...

# 2. Export session state
session = await bot.export_session()
# {'cookies': {...}, 'local_storage': {...}, 'headers': {...}}

# 3. Transfer to aiohttp
async with await bot.shadow.create_session() as session:
    async with session.get("https://target.com/api/data") as resp:
        print(await resp.json())

📡 Radar (API Detector)

Why scrape HTML if there's a hidden API? Radar listens to background traffic.

await bot.goto("https://complex-spa-site.com")

print(f"Latest JSON found: {bot.radar.latest_json}")

for req in bot.radar.requests:
    if "api/v1/users" in req['url']:
        print(f"Found User API: {req['url']}")

🖱️ GhostCursor (Human Movement)

Move the mouse like a human with Bezier curves, overshoot, and variable speed.

await bot.ghost.click("#submit-btn")
await bot.ghost.hover("#menu-item", duration=0.5)
await bot.ghost.idle_movement(duration=2.0)  # Subtle jitter

🔄 TurnstileSolver (Built-in)

Solve Cloudflare Turnstile without external APIs.

success = await bot.solve_turnstile(timeout=30)
if success:
    print("Turnstile bypassed!")

⌨️ ChaosTyper (Human Typing)

Type with realistic speed variations and occasional typos + corrections.

await bot.typer.type_human("#input", "Hello World")

⚖️ Solicitor (Captcha Solving)

Connect to external solvers (like 2Captcha) for ReCaptcha, hCaptcha, and Turnstile.

bot.solicitor.set_solver(TwoCaptchaSolver(api_key="KEY"))

await bot.solicitor.solve_recaptcha_v2()  # Auto-injects
await bot.solicitor.solve_turnstile()     # Auto-injects

🔐 Fingerprint Profiles

Match your browser fingerprint with your TLS layer:

from py_parkour import FingerprintGallery

# Available profiles
profiles = FingerprintGallery.list_profiles()
# ['chrome_120_win11', 'chrome_120_macos', 'firefox_121_linux', 'safari_17_ios', ...]

# Use a profile
bot = ParkourBot.with_profile("chrome_120_win11")

# Or customize
from py_parkour import BrowserFingerprint

fingerprint = BrowserFingerprint(
    user_agent="Mozilla/5.0...",
    viewport={"width": 1920, "height": 1080},
    timezone="America/New_York",
    locale="en-US",
)
bot = ParkourBot(fingerprint=fingerprint)

⚡ Context Pooling

For faster operation, maintain a pool of browser contexts:

bot = ParkourBot(pool_size=5)  # Maintain 5 contexts
await bot.start()

# Get context from pool (10x faster than creating new)
context = await bot.get_pooled_context()
try:
    page = await context.new_page()
    await page.goto("https://example.com")
finally:
    await bot.release_pooled_context(context)

# Check pool stats
print(bot.pool_stats())

🏗 Architecture

  • Core: Async Playwright wrapper with stealth and fingerprinting
  • Gadgets: Modular tools attached to the bot
    • .crusher - Cookie consent handling
    • .compass - Pagination
    • .identity - Temp identity generation
    • .shadow - Session export
    • .radar - API discovery
    • .ghost - Human-like mouse movement
    • .spatial - Geometric element finding
    • .typer - Human-like typing
    • .solicitor - External captcha solving
    • .turnstile - Built-in Turnstile solver

🔗 Integration with CloudScraper

Py-Parkour is designed to work seamlessly with CloudScraper and TLS-Chameleon:

# Unified workflow
from py_parkour import ParkourBot
import cloudscraper

async def bypass_and_scrape(url):
    # Use browser for initial bypass
    bot = ParkourBot.for_cloudscraper("chrome_120_win11")
    await bot.start()
    
    await bot.goto(url)
    await bot.crush_cookies()
    await bot.solve_turnstile()
    
    # Hand off to cloudscraper
    scraper = cloudscraper.create_scraper()
    await bot.import_to_cloudscraper(scraper)
    await bot.close()
    
    # Continue with fast requests
    return scraper.get(url).text

📚 More Resources


Built with ❤️ for Scrapers who hate boilerplate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_parkour-3.1.0.tar.gz (37.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_parkour-3.1.0-py3-none-any.whl (39.6 kB view details)

Uploaded Python 3

File details

Details for the file py_parkour-3.1.0.tar.gz.

File metadata

  • Download URL: py_parkour-3.1.0.tar.gz
  • Upload date:
  • Size: 37.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for py_parkour-3.1.0.tar.gz
Algorithm Hash digest
SHA256 503785dbe85b01abe98e800f5269ab7efbfe26b98952ad27017b72ea156ac9a2
MD5 4bfd9d8ebee49e189df0ff38ab1163ea
BLAKE2b-256 291a38c183ba401ef7314d69f35d2afb13d86cfbddec89fdc585a81b9c55833b

See more details on using hashes here.

File details

Details for the file py_parkour-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: py_parkour-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for py_parkour-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 400e56b4fefe528a685e5e70e87fe2bc265c574e508f041226113689356d356b
MD5 eb5f134b0bc18e2e5cd51e30c3266759
BLAKE2b-256 f1fe556ab5b3f063871a51753b7eafd02b6c6f308bb91eebf4d538c72bf712bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page