Skip to main content

Next-gen Anti-Detection Scraper with AI & Browser Fallback

Project description

TitanScraper V2

TitanScraper Banner

The Ultimate Anti-Bot Scraper for Python.

TitanScraper is a high-performance scraping library designed to bypass the toughest anti-bot protections (Cloudflare, Akamai, Datadome, etc.). It uses a tiered approach, starting with lightweight requests and escalating to full browser automation with AI solvers only when necessary.

Features

  • Tier 1: Intelligent Requests: Handles headers, TLS fingerprinting (simulated), and simple redirects.
  • Tier 2: JSD Solver: Native Go-based solver for Cloudflare JavaScript challenges.
  • Tier 3: Browser Fallback: Auto-launches a stealth Playwright browser for 403/503 bypass (G2, CoinList, etc.).
  • Tier 4: Captcha Solving:
    • Cloudflare Turnstile: Auto-detects and human-clicks or uses external solvers.
    • reCAPTCHA v2/v3: Native audio solving + Support for 2Captcha, CapMonster, Anti-Captcha.
    • AI Custom Model: PyTorch-based CNN for text captchas.
  • Deep Fingerprint Spoofing: Injects noise into Canvas, WebGL, and AudioContext to defeat device tracking.
  • Session Persistence: Save/Load cookies to build "Trust Scores" across sessions.
  • Smart Auto-Detection: Automatically identifies protection (Cloudflare, Akamai, AWS WAF) and selects the best bypass strategy (TLS Rotation, Browser, etc.).
  • Proxies & Stealth: Built-in support for rotating proxies and fingerprint randomization (User-Agent, Viewport, Locale).

Installation

# 1. Install Python packages
pip install .

# 2. Install Playwright Browsers
playwright install chromium

# 3. Setup JSD Solver (Go required)
python setup_jsd.py

# 4. System Requirements
# Install ffmpeg for Audio Captcha solving

Usage

1. One-Click Bypass (Recommended)

Automatically handles challenges, captchas, and fallbacks.

from titan import TitanScraper

# Optional: Add Proxies
proxies = {
    "http": "http://user:pass@host:port",
    "https": "http://user:pass@host:port"
}

scraper = TitanScraper(proxies=proxies)

# Just provide the URL
response = scraper.bypass("https://nowsecure.nl")

print(response.status_code)
print(response.content)

2. Advanced / Manual Control

# Access specific modules
scraper.jsd_solver.solve(url)
cookies = scraper.browser_manager.get_cookies(url)

# Activate Disguise System (100% Consistency)
scraper.set_disguise("modern_mac") # or "modern_windows"

3. Disguise System (Consistency Engine)

For targets detecting "Mismatched Fingerprints" (Cloudflare V3/Enterprise):

# 1. Masquerade as a Mac user (Safari + MacIntel + Apple GPU)
scraper.set_disguise("modern_mac")

# 2. Masquerade as a Windows user (Chrome + Win32 + NVIDIA)
scraper.set_disguise("modern_windows")

# Now all requests will perfectly match this identity.
scraper.bypass("https://strict-site.com")

Why use this? High-end antibots check if your User-Agent matches your TLS Fingerprint and your GPU Renderer.

  • If you just change User-Agent to "iPhone" but use Python TLS, you get banned.
  • The Disguise System syncs everything to match the chosen profile.

Available Profiles & Parameters

Parameter modern_windows modern_mac
User-Agent Chrome 120 (Win) Safari 15.3 (Mac)
TLS Handshake chrome120 safari15_3
Navigator Platform Win32 MacIntel
WebGL Vendor Google Inc. (NVIDIA) Apple Inc.
WebGL Renderer NVIDIA GeForce RTX 3060... Apple GPU
Hardware Core Count 16 8
Device Memory (GB) 8 8
Default Viewport 1920x1080 1440x900

4. External Captcha Providers

Use professional services for 100% reliability on hard targets.

captcha_config = {
    "provider": "2captcha", # '2captcha', 'capmonster', 'anticaptcha'
    "api_key": "YOUR_API_KEY"
}

scraper = TitanScraper(captcha_config=captcha_config)
scraper.bypass("https://protected-site.com")

5. Training the AI Captcha Solver

  1. Collect Data:
    scraper.browser_manager.save_element_screenshot(url, "#captcha-img", "data/label.png")
    
  2. Train:
    python train_captcha.py --mode train --data_dir ./data --epochs 20
    
  3. Predict:
    python train_captcha.py --mode predict --image ./test.png
    

Roadmap & Suggestions

To defeat even more advanced systems:

  1. Residential Proxy Rotation: Integrate with providers (BrightData, Smartproxy) to rotate IPs per request.
  2. Machine Learning Behavior: Train a model on real user mouse movements instead of just Bezier curves.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

titanscraper_pro-2.0.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

titanscraper_pro-2.0.0-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file titanscraper_pro-2.0.0.tar.gz.

File metadata

  • Download URL: titanscraper_pro-2.0.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for titanscraper_pro-2.0.0.tar.gz
Algorithm Hash digest
SHA256 7d5ed117c306b36b858944cd77e604f2cdba18e0140d164adbad71bee2ce2b28
MD5 d2c4b8f151b40763c44417cf1f83f8f3
BLAKE2b-256 528b131e922e1f10997ba31144a0ff38a6eeb3d33d005e3afd60efd474a21861

See more details on using hashes here.

File details

Details for the file titanscraper_pro-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for titanscraper_pro-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aaff6ba4e73e5871d09d4e46a2b18273b2c4acbf888b9243f840fcbd8ad3da49
MD5 f9972aa59d82ca360b83192d433c1a00
BLAKE2b-256 55f24ebd127b8f0fd5eb3c5d2f4e7de5b3e6cbcb76f888edf2196f9dc2f2f15c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page