A blazing fast, async-first, undetectable web and mobile automation framework powered by CDP and ADB
Project description
🕷️ Chuscraper
Stealth-focused Web & Mobile automation framework powered by CDP and ADB
You Only Scrape Once — data extraction made smarter, faster, and more resilient.
🚀 What is Chuscraper?
Chuscraper is a Python web & mobile scraping library that uses CDP (Chrome DevTools Protocol) for web and ADB (Android Debug Bridge) for mobile apps. It extracts structured data, interacts with pages/screens, and automates workflows — with a heavy focus on Anti-Detection and Stealth.
It converts standard Chromium instances into undetectable agents that can bypass bot verification systems like Cloudflare, Akamai, and Datadome, while also allowing control of native Android apps for data extraction.
🌟 Key Features
🕷️ Universal Crawler (New!)
Turn entire websites into LLM-ready data with a single command.
- Sitemap & BFS: Supports both sitemap-based (fast) and BFS (deep) crawling strategies.
- Streaming: Stream extracted data directly to your database without memory limits.
- Multi-Format: Extract Markdown, HTML, and Text simultaneously.
- Robust: Handles redirects, SPA link discovery, and concurrency automatically.
- AI Extraction: Integrate OpenAI/LLMs to extract structured JSON data from any page using natural language prompts.
📱 Native Mobile App Scraping
Chuscraper now supports scraping native Android apps using ADB:
- UI Automation: Tap, swipe, and type on any connected Android device (Real or Emulator).
- XML Dumping: Extract the full UI hierarchy as XML to find elements by text, resource-id, or content-desc.
- Background Execution: Run scripts without touching the device.
- Zero-Setup: Just enable USB Debugging and connect. No Appium server required.
🕵️♂️ Dynamic Stealth & Fingerprinting (New!)
Chuscraper now includes an advanced Auto-Update and Fingerprint Rotation engine:
- Auto-Update Chrome Version: Automatically detects your installed Chrome version and updates the User-Agent to match. No manual updates required!
- Fingerprint Rotation: Randomizes hardware fingerprints (RAM, CPU, Screen Resolution) per session while strictly adhering to your host OS (Windows, macOS, Linux) to prevent OS mismatch detection.
- Client Hints Sync: Automatically patches
navigator.userAgentDatato match the User-Agent string. - Advanced Stealth Patches: 6 core JS bypasses for WebDriver, Chrome Runtime, Canvas/WebGL noise, and iFrame leaks.
- Modern Timezones: Automatically syncs browser timezone with IP location using modern IANA names.
⚡ Async + Fast
Built on async CDP, low overhead, no heavy browser bundles.
🔄 Advanced Selector & Extraction Engine (New!)
Chuscraper now includes a high-performance parsing engine:
- Adaptive Selectors: Save and automatically relocate elements even if the DOM structure changes.
- AI-Ready Extraction: One-click conversion of pages or elements to clean Markdown or normalized Text.
- CSS & XPath Support: Unified API for high-speed selection.
🛠️ Hidden Gems (Undocumented Functions)
Chuscraper has several advanced functions that are often missed:
select_text(selector): Quickly get the inner text of an element in one line.save_snapshot(filename): Save a full MHTML snapshot of the current page.to_markdown()/to_text(): Convert any liveElementdirectly to Markdown or plain text.wait_for_ready_state(state): Wait specifically forloading,interactive, orcompletedocument states.mouse_drag(destination): Perform native drag-and-drop operations with human-like movement.print_to_pdf(filename): Export the current page as a professional PDF.get_all_urls(): Extract every link, image, and asset URL from the page in one call.scroll_down(amount=25): Smoothly scroll down by a percentage of the page height.human_click(selector)/human_type(selector, text): High-level aliases for ultra-realistic human behavior.submit(selector): One-click form submission for forms or individual buttons.activate()/bring_to_front(): Bring a background tab to the front for interaction.
🔄 Flexible Outputs
Supports JSON, CSV, Markdown, Excel, Pydantic, and more.
📦 Installation
pip install chuscraper
[!TIP] Use within a virtual environment to avoid conflicts.
Example: Advanced Mode (Elite Stealth + Human Interaction)
import asyncio
import chuscraper as zd
async def main():
# 1. Launch with all-in-one start() helper
async with await zd.start(
headless=False,
stealth=True,
lang="en-US",
retry_enabled=True
) as browser:
page = browser.main_tab
await page.goto("https://github.com/login")
# 2. Use Ultra-Realistic Human Interactions
# Automatically retries if element is loading/stale
await page.human_type("#login_field", "jules_bot")
await page.human_type("#password", "SecurePass123!")
# 3. One-Click Form Submission
await page.submit("form")
# 4. Extract with Adaptive Selectors
# 'adaptive=True' saves element metadata for resilient relocation
results = await page.select_all(".repository-item", adaptive=True)
for item in results:
# 5. Get clean Markdown for LLMs instantly
print(await item.to_markdown())
if __name__ == "__main__":
asyncio.run(main())
[!NOTE]
chuscraperautomatically handles Chrome process cleanup and Local Proxy lifecycle.
⚙️ Configuration Switches (Parameters)
Chuscraper gives you full control via zd.start(). Here are the powerful switches you can use:
🛠️ Core Switches
| Switch | Description | Default |
|---|---|---|
headless |
Run without a visible window (True/False) |
False |
stealth |
Master Switch for advanced anti-detection (System Fingerprints + JS Bypasses) | False |
stealth_domain |
The domain used for cookie storage/loading in stealth mode | "" |
user_data_dir |
Path to save/load browser profile (keep logins/cookies) | Temp |
proxy |
Proxy URL (e.g. http://user:pass@host:port) |
None |
🚀 Advanced Switches
| Switch | Description | Default |
|---|---|---|
browser_executable_path |
Custom path to Chrome/Brave binary (auto-detect if omitted) | Auto |
browser |
Browser selection: "auto", "chrome", "brave" |
"auto" |
browser_args |
Extra Chromium args list | [] |
sandbox |
Set False for Linux/Docker/root environments |
True |
lang |
Browser locale/language (e.g., en-US, hi-IN) |
en-US |
user_agent |
Manually override User-Agent (not recommended with stealth=True) |
Auto |
disable_webrtc |
Prevent IP leaks via WebRTC | True |
disable_webgl |
Disable WebGL (can reduce detection surface in some setups) | False |
timezone |
Force timezone (IANA format, e.g. Asia/Kolkata) |
Auto/None |
stealth_options |
Dict for fine-grained stealth patches | Built-in defaults |
retry_enabled |
Enable retry helpers for unstable workflows | False |
retry_timeout |
Retry timeout seconds | 10.0 |
retry_count |
Retry count | 3 |
browser_connection_timeout |
Wait between connection attempts | 0.25 |
browser_connection_max_tries |
Browser connection retries | 10 |
🕵️♂️ Granular Stealth Options
When stealth=True, you can fine-tune specific patches by passing a stealth_options dict:
await zd.start(stealth=True, stealth_options={
"patch_webdriver": True, # Hide WebDriver
"patch_webgl": True, # Spoof Graphics Card
"patch_canvas": True, # Add Canvas Noise
"patch_audio": False # Disable Audio Fingerprinting noise
})
📱 Mobile Scraping Example
Scrape data from any native Android app (e.g., Hotel/Flight apps):
import asyncio
from chuscraper.mobile import MobileDevice
async def main():
# Connect to first available device
device = await MobileDevice().connect()
# Example: Searching for hotels
city_input = await device.find_element(text="Enter destination")
if city_input:
await city_input.type("Goa")
search_btn = await device.find_element(resource_id="com.hotel.app:id/search_btn")
if search_btn:
await search_btn.click()
# Extract prices
prices = await device.find_elements(resource_id="com.hotel.app:id/price_text")
for price in prices:
print(price.get_text())
if __name__ == "__main__":
asyncio.run(main())
🛡️ Stealth & Anti-Detection Proof
We don't just claim to be stealthy; we prove it. Below are the results from top anti-bot detection suites, all passed with 100% "Human" status.
👉 View Full Visual Proofs & Screenshots Here
| Detection Suite | Result | Status |
|---|---|---|
| SannySoft | No WebDriver detected | ✅ Pass |
| BrowserScan | 100% Trust Score | ✅ Pass |
| PixelScan | Consistent Fingerprint | ✅ Pass |
| IPHey | Software Clean (Green) | ✅ Pass |
| CreepJS | 0% Stealth / 0% Headless | ✅ Pass |
| Fingerprint.com | No Bot Detected | ✅ Pass |
🌍 Real-World Protection Bypass
We tested chuscraper against live websites protected by major security providers:
| Provider | Target | Result |
|---|---|---|
| Cloudflare | Turnstile Demo | ✅ Solved Automatically |
| DataDome | Antoine Vastel Research | ✅ Accessed |
| Akamai | Nike Product Page | ✅ Bypassed |
📖 Documentation
Full technical guides are available in the docs/ folder:
Translations (Chinese, Japanese, etc.) coming soon.
💖 Support & Sponsorship
chuscraper is an open-source project maintained by [Toufiq Qureshi]. If the library has helped you or your business, please consider supporting its development:
- GitHub Sponsors: Sponsor me on GitHub
- Corporate Sponsorship: If you are a Proxy Provider or Data Company, we offer featured placement in our documentation. Contact us for partnership opportunities.
- Custom Scraping Solutions: Need a private, high-performance scraper? We offer professional consulting.
🛠️ Contributing
Want to contribute? Open an issue or send a pull request — all levels welcome! Please follow the CONTRIBUTING.md guidelines.
📜 License
Chuscraper is licensed under the AGPL-3.0 License. This ensures that any software using Chuscraper must also be open-source, protecting the community and your freedom.
Made with ❤️ by [Toufiq Qureshi]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chuscraper-0.19.9.tar.gz.
File metadata
- Download URL: chuscraper-0.19.9.tar.gz
- Upload date:
- Size: 528.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcd8209fd85617d12cb8375ce2d7103b67bb5f84e6cfe7741520f55a55532b8b
|
|
| MD5 |
ad87857b0ed0d9957d14bc91b98fbd1d
|
|
| BLAKE2b-256 |
fdcf2ad129dab1fba8b202bb6e7b579cab32463bfa4d178135319af24827e825
|
File details
Details for the file chuscraper-0.19.9-py3-none-any.whl.
File metadata
- Download URL: chuscraper-0.19.9-py3-none-any.whl
- Upload date:
- Size: 297.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f07ab5864e463cac703c5f572090c1d74b48d838ddb9cffa0be759879fa9f3b9
|
|
| MD5 |
879f22e635010f4baa542398e2e4e22a
|
|
| BLAKE2b-256 |
6d8c3f65ca52ec80c4c489e868895b478bafeb38402223e36b24d4e7da82483d
|