Skip to main content

A blazing fast, async-first, undetectable webscraping/web automation framework

Project description

Chuscraper Logo

🕷️ Chuscraper

LLM + CDP powered stealth-focused web scraping & automation framework
You Only Scrape Once — data extraction made smarter, faster, and more resilient.


🚀 What is Chuscraper?

Chuscraper is a Python web scraping & automation library that uses CDP (Chrome DevTools Protocol) and LLMs to extract structured data, interact with pages, and automate workflows — with a heavy focus on Anti-Detection and Stealth.

With AI-powered extraction, you tell it what to extract — it figures out how.


🌟 Features

🕵️‍♂️ Stealth & Anti-Detection

  • Hides navigator.webdriver, user agent rotation
  • Canvas/WebGL noise + hardware spoofing
  • Timezone & geolocation spoofing

🤖 AI-Driven Data Extraction

  • Semantic extraction using LLMs
  • Converts HTML into structured JSON/Pydantic

🧠 Autonomous Navigation

  • Intelligent pilot (ai_pilot) that clicks/types until goal achieved

⚡ Async + Fast

Built on async CDP, low overhead, no heavy browser bundles.

🔄 Flexible Outputs

Supports JSON, CSV, Markdown, Excel, Pydantic, and more.

🌐 Integrations

  • LLM Providers: OpenAI, Gemini, Anthropic, Ollama
  • Frameworks: LangChain, LlamaIndex, Agno, Crew.ai

📦 Installation

pip install chuscraper

# For AI Capabilities
pip install chuscraper[ai]

[!TIP] Use within a virtual environment to avoid conflicts.


💻 Quick Start (Async)

import asyncio
from chuscraper import start

async def main():
    browser = await start(headless=False)
    page = await browser.get("https://www.makemytrip.com/")

    # Tell the AI what to extract
    print("AI is navigating...")
    await page.ai_pilot("Search hotels in Goa for next weekend")

    # Extract structured data
    result = await page.ai_extract("Get the first 3 hotels with prices")
    import json
    print(json.dumps(result, indent=2))

    await browser.stop()

if __name__ == "__main__":
    asyncio.run(main())

🤖 AI Usage with Providers

Chuscraper supports multiple providers out-of-the-box.

1. Gemini (Native)

from chuscraper.ai.providers import GeminiProvider
provider = GeminiProvider(api_key="YOUR_GEMINI_API_KEY")
await page.ai_extract("Extract data", provider=provider)

2. OpenAI

from chuscraper.ai.providers import OpenAIProvider
provider = OpenAIProvider(api_key="YOUR_OPENAI_API_KEY")
await page.ai_extract("Extract data", provider=provider)

3. Local LLMs (via Ollama)

from chuscraper.ai.providers import OllamaProvider
# Uses Ollama's OpenAI-compatible API (default: localhost:11434)
provider = OllamaProvider(model_name="llama3")
await page.ai_extract("Extract data", provider=provider)

📖 Documentation

Full technical guides are available in the docs/ folder:

Translations (Chinese, Japanese, etc.) coming soon.


🛠️ Contributing

Want to contribute? Open an issue or send a pull request — all levels welcome! Please follow the CONTRIBUTING.md guidelines.


📜 License

Chuscraper is licensed under the MIT License.

Made with ❤️ by [Toufiq Qureshi]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chuscraper-0.16.3.tar.gz (444.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chuscraper-0.16.3-py3-none-any.whl (371.2 kB view details)

Uploaded Python 3

File details

Details for the file chuscraper-0.16.3.tar.gz.

File metadata

  • Download URL: chuscraper-0.16.3.tar.gz
  • Upload date:
  • Size: 444.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for chuscraper-0.16.3.tar.gz
Algorithm Hash digest
SHA256 6c231b436bfaa22fdf02e102c93c53a1ff2d73a6d81b93d2090470590278c432
MD5 bc7e72158912bf86598773983587330e
BLAKE2b-256 18aeaaa207e446f05f399fb88ec8378bb8c36a4ff56642bb682a350d9bb5ec3c

See more details on using hashes here.

File details

Details for the file chuscraper-0.16.3-py3-none-any.whl.

File metadata

  • Download URL: chuscraper-0.16.3-py3-none-any.whl
  • Upload date:
  • Size: 371.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for chuscraper-0.16.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f7ef6d29121025878fa39b19dd6b9c040bf24ed1f83769a103ef2c46d27ad0a5
MD5 da6d8e55c421a660e4dc44795624d9d5
BLAKE2b-256 a9b6e6f973bd653332ccdef29626302783831b5e11bfe062a6e778c113f0e7e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page