A Hybrid Scraper Framework: cookie consent, auto-pagination, temp identities, session bridging, API discovery, stealth fingerprinting, and Turnstile solving.
Project description
_____ _--_ _
| __ \ | _ \ | |
| |__) | _| |_) |__ _ __| | ___ ___ _ _ _ __
| ___/ | | | _ // _` | '__| |/ / _ \| | | | '__|
| | | |_| | | \ \ (_| | | | < (_) | |_| | |
|_| \__, |_| \_\__,_|_| |_|\_\___/ \__,_|_|
__/ |
|___/
Version: 3.0.0
Author: Zinzied (ziedboughdir@gmail.com) · GitHub
🏃 Py-Parkour: The Hybrid Scraper Framework
Py-Parkour is a lightweight automation utility designed to solve the biggest annoyances in modern web scraping:
- 🍪 Cookie Consents: Detecting and destroying GDPR/modal popups.
- 🧭 Pagination: Auto-detecting "Next" buttons or infinite scroll.
- 🎭 Verification Gates: Generating temporary identities (Email/SMS) for signups.
- 👻 Hybrid Scraping: Break in with the browser, then steal the session for fast API calls.
- 📡 API Discovery: Automatically detect hidden JSON APIs.
- 🔐 Stealth Mode: Browser fingerprinting and bot evasion scripts.
- ⚡ Turnstile Solving: Built-in Cloudflare Turnstile bypass.
It turns your scraper into a workflow automation platform.
🆕 What's New in v3.0.0
- 🎯 Gadget System: Pluggable modules via constructor
- 🔐 Fingerprint Sync: Match browser fingerprint with TLS layer (TLS-Chameleon compatible)
- ⚡ Context Pool: 10x faster challenge solving with reusable contexts
- 🔄 Turnstile Auto-Solver: Built-in micro-interaction patterns (no external API needed)
- 📤 Session Export: Export cookies, localStorage, sessionStorage for cloudscraper handoff
- 👻 Stealth Injection: Comprehensive bot evasion scripts
📦 Installation
pip install py-parkour[full]
Or for development:
pip install -r requirements.txt
playwright install
🚀 Quick Start
Basic Usage
import asyncio
from py_parkour import ParkourBot
async def main():
# Supports 'chromium' (default), 'firefox', or 'webkit'
bot = ParkourBot(headless=False, browser="firefox")
await bot.start()
await bot.goto("https://target-website.com")
# ... use gadgets here ...
await bot.close()
asyncio.run(main())
With Fingerprint & Stealth (v3.0)
from py_parkour import ParkourBot, FingerprintGallery
async def main():
# Create bot with Chrome 120 fingerprint
bot = ParkourBot(
headless=True,
gadgets=['ghost', 'turnstile', 'shadow', 'crusher'],
fingerprint=FingerprintGallery.CHROME_120_WIN11,
stealth=True
)
await bot.start()
# Solve Turnstile automatically
await bot.goto("https://protected-site.com")
await bot.solve_turnstile()
# Export session for cloudscraper
session = await bot.export_session()
print(f"Cookies: {session['cookies']}")
await bot.close()
For CloudScraper Integration
from py_parkour import ParkourBot
import cloudscraper
async def main():
# Create bot optimized for cloudscraper
bot = ParkourBot.for_cloudscraper(tls_profile="chrome_120_win11")
await bot.start()
# Bypass challenges with browser
await bot.goto("https://protected-site.com")
await bot.solve_turnstile()
# Hand off to cloudscraper
scraper = cloudscraper.create_scraper()
await bot.import_to_cloudscraper(scraper)
# Continue with fast requests
response = scraper.get("https://protected-site.com/api/data")
print(response.json())
await bot.close()
☕ Support / Donate
If you found this library useful, buy me a coffee!
License
MIT
🎯 Gadgets
🍪 Crusher (Cookie Bypasser)
Don't write brittle selectors for every "Accept Cookies" button.
await bot.crush_cookies()
🧭 Compass (Auto-Pagination)
Stop guessing if the site uses ?page=2 or a "Next >" button.
async for page_number in bot.crawl(max_pages=10):
print(f"Scraping Page {page_number}: {bot.current_url}")
🎭 Disguises (Temp Identity)
Need to sign up to view data? Generate a burner identity.
identity = await bot.identity.generate_identity(country="US")
print(f"Using email: {identity.email}")
code = await bot.identity.wait_for_code()
await bot.driver.page.fill("#otp-input", code)
👻 Shadow (Session Bridge)
Break in with the browser, then steal the session for high-speed API calls.
# 1. Login with the browser
await bot.goto("https://target.com/login")
# ... do login stuff ...
# 2. Export session state
session = await bot.export_session()
# {'cookies': {...}, 'local_storage': {...}, 'headers': {...}}
# 3. Transfer to aiohttp
async with await bot.shadow.create_session() as session:
async with session.get("https://target.com/api/data") as resp:
print(await resp.json())
📡 Radar (API Detector)
Why scrape HTML if there's a hidden API? Radar listens to background traffic.
await bot.goto("https://complex-spa-site.com")
print(f"Latest JSON found: {bot.radar.latest_json}")
for req in bot.radar.requests:
if "api/v1/users" in req['url']:
print(f"Found User API: {req['url']}")
🖱️ GhostCursor (Human Movement)
Move the mouse like a human with Bezier curves, overshoot, and variable speed.
await bot.ghost.click("#submit-btn")
await bot.ghost.hover("#menu-item", duration=0.5)
await bot.ghost.idle_movement(duration=2.0) # Subtle jitter
🔄 TurnstileSolver (Built-in)
Solve Cloudflare Turnstile without external APIs.
success = await bot.solve_turnstile(timeout=30)
if success:
print("Turnstile bypassed!")
⌨️ ChaosTyper (Human Typing)
Type with realistic speed variations and occasional typos + corrections.
await bot.typer.type_human("#input", "Hello World")
⚖️ Solicitor (Captcha Solving)
Connect to external solvers (like 2Captcha) for ReCaptcha, hCaptcha, and Turnstile.
bot.solicitor.set_solver(TwoCaptchaSolver(api_key="KEY"))
await bot.solicitor.solve_recaptcha_v2() # Auto-injects
await bot.solicitor.solve_turnstile() # Auto-injects
🔐 Fingerprint Profiles
Match your browser fingerprint with your TLS layer:
from py_parkour import FingerprintGallery
# Available profiles
profiles = FingerprintGallery.list_profiles()
# ['chrome_120_win11', 'chrome_120_macos', 'firefox_121_linux', 'safari_17_ios', ...]
# Use a profile
bot = ParkourBot.with_profile("chrome_120_win11")
# Or customize
from py_parkour import BrowserFingerprint
fingerprint = BrowserFingerprint(
user_agent="Mozilla/5.0...",
viewport={"width": 1920, "height": 1080},
timezone="America/New_York",
locale="en-US",
)
bot = ParkourBot(fingerprint=fingerprint)
⚡ Context Pooling
For faster operation, maintain a pool of browser contexts:
bot = ParkourBot(pool_size=5) # Maintain 5 contexts
await bot.start()
# Get context from pool (10x faster than creating new)
context = await bot.get_pooled_context()
try:
page = await context.new_page()
await page.goto("https://example.com")
finally:
await bot.release_pooled_context(context)
# Check pool stats
print(bot.pool_stats())
🏗 Architecture
- Core: Async Playwright wrapper with stealth and fingerprinting
- Gadgets: Modular tools attached to the bot
.crusher- Cookie consent handling.compass- Pagination.identity- Temp identity generation.shadow- Session export.radar- API discovery.ghost- Human-like mouse movement.spatial- Geometric element finding.typer- Human-like typing.solicitor- External captcha solving.turnstile- Built-in Turnstile solver
🔗 Integration with CloudScraper
Py-Parkour is designed to work seamlessly with CloudScraper and TLS-Chameleon:
# Unified workflow
from py_parkour import ParkourBot
import cloudscraper
async def bypass_and_scrape(url):
# Use browser for initial bypass
bot = ParkourBot.for_cloudscraper("chrome_120_win11")
await bot.start()
await bot.goto(url)
await bot.crush_cookies()
await bot.solve_turnstile()
# Hand off to cloudscraper
scraper = cloudscraper.create_scraper()
await bot.import_to_cloudscraper(scraper)
await bot.close()
# Continue with fast requests
return scraper.get(url).text
📚 More Resources
- Gadgets Guide - Detailed examples for Compass and Radar
- Cookbook - Common recipes and patterns
- Anti-Bot Strategy - Understanding detection and evasion
Built with ❤️ for Scrapers who hate boilerplate.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_parkour-3.1.0.tar.gz.
File metadata
- Download URL: py_parkour-3.1.0.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
503785dbe85b01abe98e800f5269ab7efbfe26b98952ad27017b72ea156ac9a2
|
|
| MD5 |
4bfd9d8ebee49e189df0ff38ab1163ea
|
|
| BLAKE2b-256 |
291a38c183ba401ef7314d69f35d2afb13d86cfbddec89fdc585a81b9c55833b
|
File details
Details for the file py_parkour-3.1.0-py3-none-any.whl.
File metadata
- Download URL: py_parkour-3.1.0-py3-none-any.whl
- Upload date:
- Size: 39.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
400e56b4fefe528a685e5e70e87fe2bc265c574e508f041226113689356d356b
|
|
| MD5 |
eb5f134b0bc18e2e5cd51e30c3266759
|
|
| BLAKE2b-256 |
f1fe556ab5b3f063871a51753b7eafd02b6c6f308bb91eebf4d538c72bf712bc
|