Skip to main content

A blazing fast, async-first, undetectable webscraping/web automation framework

Project description

Chuscraper Logo

🕷️ Chuscraper

LLM + CDP powered stealth-focused web scraping & automation framework
You Only Scrape Once — data extraction made smarter, faster, and more resilient.


🚀 What is Chuscraper?

Chuscraper is a Python web scraping & automation library that uses CDP (Chrome DevTools Protocol) and LLMs to extract structured data, interact with pages, and automate workflows — with a heavy focus on Anti-Detection and Stealth.

With AI-powered extraction, you tell it what to extract — it figures out how.


🌟 Features

🕵️‍♂️ Stealth & Anti-Detection

  • Hides navigator.webdriver, user agent rotation
  • Canvas/WebGL noise + hardware spoofing
  • Timezone & geolocation spoofing

🤖 AI-Driven Data Extraction

  • Semantic extraction using LLMs
  • Converts HTML into structured JSON/Pydantic

🧠 Autonomous Navigation

  • Intelligent pilot (ai_pilot) that clicks/types until goal achieved

⚡ Async + Fast

Built on async CDP, low overhead, no heavy browser bundles.

🔄 Flexible Outputs

Supports JSON, CSV, Markdown, Excel, Pydantic, and more.

🌐 Integrations

  • LLM Providers: OpenAI, Gemini, Anthropic, Ollama
  • Frameworks: LangChain, LlamaIndex, Agno, Crew.ai

📦 Installation

pip install chuscraper

# For AI Capabilities
pip install chuscraper[ai]

[!TIP] Use within a virtual environment to avoid conflicts.



💻 Quick Start (The "Easy" Way)

Chuscraper is designed for Zero Boilerplate. You don't need complex configuration objects just to start a stealthy session.

import asyncio
import chuscraper as zd

async def main():
    # DIRECT START: Specify stealth, proxy, or headless directly in start()
    async with await zd.start(headless=False, stealth=True) as browser:
        
        # 🟢 BROWSER-LEVEL SHORTCUT
        await browser.goto("https://www.makemytrip.com/")
        
        # 🟢 INTUITIVE ALIASES (goto, title, select_text)
        page = browser.main_tab
        await page.goto("https://example.com")
        
        title = await page.title()
        header = await page.select_text("h1")
        
        print(f"Bhai, Title hai: {title}")
        print(f"Header: {header}")

        # 🤖 AI-POWERED PILOT
        print("AI is navigating...")
        await page.ai_pilot("Search hotels in Goa for next weekend")

        # EXTRACT structured data
        result = await page.ai_extract("Get the first 3 hotels with prices")
        print(result)

if __name__ == "__main__":
    asyncio.run(main())

[!NOTE] chuscraper automatically handles Chrome process cleanup and Local Proxy lifecycle.


🤖 AI Usage with Providers

Chuscraper supports multiple providers out-of-the-box.

1. Gemini (Native)

from chuscraper.ai.providers import GeminiProvider
provider = GeminiProvider(api_key="YOUR_GEMINI_API_KEY")
await page.ai_extract("Extract data", provider=provider)

2. OpenAI

from chuscraper.ai.providers import OpenAIProvider
provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
await page.ai_extract("Extract data", provider=provider)

🛡️ Stealth & Anti-Detection Proof

We don't just claim to be stealthy; we prove it. Below are the results from top anti-bot detection suites, all passed with 100% "Human" status.

👉 View Full Visual Proofs & Screenshots Here

Detection Suite Result Status
SannySoft No WebDriver detected ✅ Pass
BrowserScan 100% Trust Score ✅ Pass
PixelScan Consistent Fingerprint ✅ Pass
IPHey Software Clean (Green) ✅ Pass
CreepJS 0% Stealth / 0% Headless ✅ Pass
Fingerprint.com No Bot Detected ✅ Pass

🌍 Real-World Protection Bypass

We tested chuscraper against live websites protected by major security providers:

Provider Target Result
Cloudflare Turnstile Demo ✅ Solved Automatically
DataDome Antoine Vastel Research ✅ Accessed
Akamai Nike Product Page ✅ Bypassed

📖 Documentation

Full technical guides are available in the docs/ folder:

Translations (Chinese, Japanese, etc.) coming soon.

💖 Support & Sponsorship

chuscraper is an open-source project maintained by [Toufiq Qureshi]. If the library has helped you or your business, please consider supporting its development:

  • GitHub Sponsors: Sponsor me on GitHub
  • Corporate Sponsorship: If you are a Proxy Provider or Data Company, we offer featured placement in our documentation. Contact us for partnership opportunities.
  • Custom Scraping Solutions: Need a private, high-performance scraper? We offer professional consulting.

🛠️ Contributing

Want to contribute? Open an issue or send a pull request — all levels welcome! Please follow the CONTRIBUTING.md guidelines.


📜 License

Chuscraper is licensed under the AGPL-3.0 License. This ensures that any software using Chuscraper must also be open-source, protecting the community and your freedom.

Made with ❤️ by [Toufiq Qureshi]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chuscraper-0.19.1.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chuscraper-0.19.1-py3-none-any.whl (402.4 kB view details)

Uploaded Python 3

File details

Details for the file chuscraper-0.19.1.tar.gz.

File metadata

  • Download URL: chuscraper-0.19.1.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for chuscraper-0.19.1.tar.gz
Algorithm Hash digest
SHA256 60f5034a77e7a8e879ba2d890be666c4b42f25cd44bda6adb6d22f97b44eec1c
MD5 7727c5b5f33dbeb58287a0c6ea22f0ae
BLAKE2b-256 27532ba2b0a6596de17cd789fa2e9880096a6a61f69d787b06e727087b3c31d3

See more details on using hashes here.

File details

Details for the file chuscraper-0.19.1-py3-none-any.whl.

File metadata

  • Download URL: chuscraper-0.19.1-py3-none-any.whl
  • Upload date:
  • Size: 402.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for chuscraper-0.19.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d04cc3753dfe8b67a4d9df6f71ba015f035c200d5ae326485c166ed26353148e
MD5 adf7f6eda7876cd0258d72d79beb8586
BLAKE2b-256 3a383d0e5cbf01fe7810505bd262956171c16402a92c068c4271fbf49516065e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page