Skip to main content

Advanced stealth web scraping framework with browser fingerprinting and network obfuscation

Project description

๐ŸฆŽ Chameleon Engine

Python Version License Build Status Coverage

Advanced stealth web scraping framework with cutting-edge browser fingerprinting and network obfuscation capabilities.

Chameleon Engine is a comprehensive microservices-based solution designed to bypass modern anti-bot detection systems through sophisticated browser fingerprinting, TLS fingerprint masking, and human behavior simulation.

โœจ Key Features

๐ŸŽญ Advanced Browser Fingerprinting

  • Dynamic Profile Generation: Create realistic browser profiles based on real-world data
  • TLS Fingerprint Masking: JA3/JA4 hash manipulation with uTLS integration
  • HTTP/2 Header Rewriting: Sophisticated header manipulation for advanced stealth
  • Multi-Browser Support: Chrome, Firefox, Safari, Edge fingerprint profiles

๐Ÿš€ Microservices Architecture

  • Fingerprint Service: FastAPI-based profile management (Python)
  • Proxy Service: High-performance proxy with TLS fingerprinting (Go)
  • Data Collection Pipeline: Automated real-world fingerprint gathering
  • Real-time Monitoring: WebSocket-based dashboard and metrics

๐ŸŽฏ Human Behavior Simulation

  • Mouse Movement Patterns: Bezier curve-based natural movements
  • Typing Simulation: Realistic typing with variable speed and errors
  • Scrolling Behavior: Natural scroll patterns and pauses
  • Timing Obfuscation: Human-like delays and interaction patterns

๐Ÿ›ก๏ธ Network Obfuscation

  • Advanced Proxy Management: Multi-format proxy loading (TXT, CSV, JSON) with automatic rotation
  • Proxy Generation: Dynamic generation of residential, datacenter, and geo-targeted proxies
  • Request Obfuscation: Timing and header randomization
  • TLS Certificate Generation: Dynamic cert creation per profile
  • HTTP/2 Settings Manipulation: Protocol-level fingerprinting

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Python App    โ”‚    โ”‚  Fingerprint     โ”‚    โ”‚   Data Source   โ”‚
โ”‚                 โ”‚โ—„โ”€โ”€โ–บโ”‚   Service        โ”‚โ—„โ”€โ”€โ–บโ”‚   Collection    โ”‚
โ”‚  Chameleon      โ”‚    โ”‚   (FastAPI)      โ”‚    โ”‚     Pipeline    โ”‚
โ”‚     Engine      โ”‚    โ”‚                  โ”‚    โ”‚                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ–ผ                       โ–ผ                       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Browser       โ”‚    โ”‚     Proxy        โ”‚    โ”‚    Database     โ”‚
โ”‚  Management     โ”‚    โ”‚    Service       โ”‚    โ”‚   PostgreSQL    โ”‚
โ”‚   (Playwright)  โ”‚โ—„โ”€โ”€โ–บโ”‚     (Go)         โ”‚โ—„โ”€โ”€โ–บโ”‚   + Redis       โ”‚
โ”‚                 โ”‚    โ”‚   uTLS + HTTP2   โ”‚    โ”‚                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Quick Start

๐ŸŽฏ Automated Installation (Recommended)

Linux/macOS:

# Clone and install with one command
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine
./install.sh

# Start services
docker-compose -f examples/docker_compose_example.yaml up -d

# Run your first scrape
python examples/simple_scrape.py https://example.com

Windows:

# Clone and install
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine
.\install.ps1

# Start services
docker-compose -f examples/docker_compose_example.yaml up -d

# Run your first scrape
python examples/simple_scrape.py https://example.com

๐Ÿ“‹ Prerequisites

  • Python 3.8+
  • Go 1.21+ (for proxy service)
  • Docker & Docker Compose (optional, for easy deployment)
  • PostgreSQL (optional, for persistent storage)
  • Redis (optional, for caching)

๐Ÿ”ง Manual Installation

# Clone the repository
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine

# Install Python package in development mode
pip install -e .

# Install Playwright browsers
playwright install

# Install Go dependencies (proxy service)
cd proxy_service
go mod tidy
cd ..

Basic Usage

import asyncio
from chameleon_engine import ChameleonEngine

async def main():
    # Initialize Chameleon Engine
    engine = ChameleonEngine(
        fingerprint_service_url="http://localhost:8000",
        proxy_service_url="http://localhost:8080"
    )

    await engine.initialize()

    # Create stealth browser session
    browser = await engine.create_browser(
        profile_type="chrome_windows",
        stealth_mode=True
    )

    # Perform scraping
    page = await browser.new_page()
    await page.goto("https://example.com")

    content = await page.content()
    print(f"Scraped content length: {len(content)}")

    # Cleanup
    await browser.close()
    await engine.cleanup()

asyncio.run(main())

๐Ÿ“š Services Setup

Option 1: Manual Setup

  1. Start Fingerprint Service:

    python -m chameleon_engine.fingerprint.main
    
  2. Start Proxy Service:

    cd proxy_service
    make run
    
  3. Run Your Application:

    python your_scraping_script.py
    

Option 2: Docker Deployment

# Start all services
docker-compose -f examples/docker_compose_example.yaml up -d

# Check service status
docker-compose ps

๐ŸŽฏ Use Cases

E-commerce Data Collection

# Scrape product pages while avoiding bot detection
await engine.scrape_ecommerce(
    target_urls=["https://shop.example.com/products/*"],
    rotate_fingerprints=True,
    human_behavior=True,
    rate_limit="1-3 requests per minute"
)

Market Research

# Collect competitive intelligence
await engine.market_research(
    competitors=["competitor1.com", "competitor2.com"],
    data_types=["pricing", "products", "reviews"],
    stealth_level="high"
)

SEO Monitoring

# Monitor search engine rankings
await engine.seo_monitoring(
    keywords=["python web scraping"],
    search_engines=["google", "bing"],
    geo_locations=["US", "UK", "DE"]
)

Academic Research

# Collect data for research purposes
await engine.academic_research(
    target_sites=["scholar.google.com", "arxiv.org"],
    data_types=["papers", "citations", "metadata"],
    ethical_scraping=True
)

๐Ÿ”ง Configuration

Environment Variables

# Fingerprint Service
export DATABASE_URL="postgresql://user:pass@localhost/chameleon"
export REDIS_URL="redis://localhost:6379"
export LOG_LEVEL="info"

# Proxy Service
export FINGERPRINT_SERVICE_URL="http://localhost:8000"
export TLS_ENABLED="false"
export PROXY_TARGET_HOST=""

Configuration File

Create chameleon_config.yaml:

fingerprint:
  service_url: "http://localhost:8000"
  cache_size: 1000
  rotation_interval: 300

proxy:
  service_url: "http://localhost:8080"
  upstream_proxies:
    - url: "http://proxy1.example.com:8080"
      auth:
        username: "user"
        password: "pass"
        type: "basic"
    - url: "http://proxy2.example.com:8080"
      weight: 2
      auth: null
  rotation_settings:
    strategy: "round_robin"
    interval: 300
    request_count: 100
  health_check:
    enabled: true
    interval: 60

behavior:
  mouse_movements: true
  typing_patterns: true
  human_delays: true

logging:
  level: "info"
  format: "json"

Proxy Configuration Details

The Go proxy service manages upstream proxies in two ways:

  1. No Upstream Proxies (Default):

    proxy:
      service_url: "http://localhost:8080"
      upstream_proxies: []
    

    Flow: Your App โ†’ Go Proxy Service โ†’ Target Website

  2. With Upstream Proxies:

    proxy:
      service_url: "http://localhost:8080"
      upstream_proxies:
        - url: "http://proxy1.example.com:8080"
          auth:
            username: "user"
            password: "pass"
            type: "basic"
        - url: "http://proxy2.example.com:8080"
          weight: 2
    

    Flow: Your App โ†’ Go Proxy Service โ†’ External Proxy โ†’ Target Website

See Proxy Management Guide for detailed configuration.

Advanced Proxy Loading

Chameleon Engine supports multiple proxy loading methods:

from chameleon_engine.proxy_loader import ProxyLoader

loader = ProxyLoader()

# Load from text files
proxies = loader.load_from_txt("proxies.txt", format_type="mixed")

# Load from CSV
proxies = loader.load_from_csv("proxies.csv")

# Generate dynamic proxies
residential_proxies = loader.generate_proxies(
    count=10,
    pattern="residential",
    geolocations=["US", "EU", "AS"]
)

# Filter proxies
http_proxies = loader.filter_proxies(proxies, protocol="http")
auth_proxies = loader.filter_proxies(proxies, has_auth=True)

See Proxy Usage Guide for comprehensive examples.

๐Ÿ“ฆ Installation Options

๐Ÿ“– Detailed Installation Guide

See INSTALL.md for comprehensive installation instructions including:

  • System-specific setup (Linux, macOS, Windows)
  • Docker installation
  • Database configuration
  • Troubleshooting common issues

๐Ÿš€ Quick Start Guide

See QUICK_START.md for a streamlined getting started experience.

๐Ÿ“Š Monitoring & Debugging

Health Checks

# Check fingerprint service
curl http://localhost:8000/health

# Check proxy service
curl http://localhost:8080/api/v1/health

Real-time Monitoring

# Get live statistics
stats = await engine.get_proxy_stats()
print(f"Active connections: {stats['active_connections']}")
print(f"Total requests: {stats['total_requests']}")

# WebSocket monitoring
import websocket
ws = websocket.WebSocketApp("ws://localhost:8080/ws")
ws.on_message = lambda ws, msg: print(f"Update: {msg}")
ws.run_forever()

API Documentation

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=chameleon_engine --cov-report=html

# Run specific test suite
pytest tests/test_fingerprint.py -v

๐Ÿ“– Examples

Quick Start Example

python examples/quick_start.py

Advanced Scraping Demo

python examples/advanced_scraping_example.py

Direct API Usage

python examples/api_client_example.py

Proxy Management Examples

# Test proxy loading functionality
python examples/test_proxy_standalone.py

# Run proxy configuration examples
python examples/proxy_loader_examples.py

For more examples, see the examples directory.

๐Ÿ” Advanced Features

Custom Fingerprint Profiles

# Create custom browser profile
custom_profile = {
    "browser_type": "chrome",
    "os": "windows",
    "version": "120.0.0.0",
    "screen_resolution": "1920x1080",
    "timezone": "America/New_York",
    "language": "en-US",
    "custom_headers": {
        "X-Custom-Header": "MyValue"
    }
}

profile = await fingerprint_client.create_profile(custom_profile)

Behavior Simulation

# Simulate human mouse movements
mouse_path = behavior_simulator.generate_mouse_path(
    start=(100, 100),
    end=(500, 300),
    duration=2.0,
    curve_type="bezier"
)

# Simulate typing with natural patterns
typing_pattern = behavior_simulator.generate_typing_pattern(
    text="Hello, World!",
    wpm=80,
    error_rate=0.02
)

Network Obfuscation

# Obfuscate request timing
original_delay = 1.0
obfuscated_delay = network_obfuscator.obfuscate_timing(original_delay)

# Obfuscate headers
headers = {"User-Agent": "Mozilla/5.0..."}
obfuscated_headers = network_obfuscator.obfuscate_headers(headers)

๐Ÿ› ๏ธ Development

Setting Up Development Environment

# Clone repository
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

Code Quality

# Format code
black chameleon_engine/
isort chameleon_engine/

# Lint code
flake8 chameleon_engine/
mypy chameleon_engine/

# Run security checks
bandit -r chameleon_engine/

Building Documentation

# Install documentation dependencies
pip install -r requirements-docs.txt

# Build docs
mkdocs build

# Serve docs locally
mkdocs serve

๐Ÿ“ˆ Performance

Benchmarks

  • Request Processing: < 10ms average latency
  • Profile Generation: < 50ms for complex profiles
  • Memory Usage: ~50MB base + ~5MB per concurrent session
  • Concurrent Sessions: 1000+ simultaneous connections

Optimization Tips

  1. Enable Redis caching for fingerprint profiles
  2. Use connection pooling for database connections
  3. Configure appropriate timeouts for target websites
  4. Monitor resource usage with built-in metrics

๐Ÿ”’ Security Considerations

Ethical Usage

  • โœ… Respect robots.txt files
  • โœ… Implement rate limiting for target websites
  • โœ… Check terms of service before scraping
  • โœ… Identify your bot when required
  • โŒ Don't overload target servers
  • โŒ Don't scrape personal data without consent
  • โŒ Don't bypass security measures illegally

Best Practices

# Ethical scraping configuration
ethical_config = {
    "rate_limit": "1 request per second",
    "respect_robots_txt": True,
    "user_agent": "MyBot/1.0 (+http://mywebsite.com/bot-info)",
    "timeout": 30,
    "max_retries": 3,
    "retry_delay": 5
}

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite: pytest
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • uTLS for TLS fingerprinting
  • Playwright for browser automation
  • FastAPI for the API framework
  • Gin for the Go web framework

๐Ÿ“ž Support

๐Ÿ—บ๏ธ Roadmap

Version 2.0

  • Machine learning-based behavior optimization
  • Advanced CAPTCHA solving integration
  • Cloud deployment templates
  • Web-based management dashboard

Version 1.5

  • Enhanced mobile browser fingerprinting
  • WebGL and Canvas fingerprinting
  • Audio fingerprinting capabilities
  • Advanced proxy pool management
  • Multi-format proxy loading (TXT, CSV, JSON)
  • Dynamic proxy generation (residential, datacenter, geo-targeted)
  • Comprehensive proxy filtering and validation

Version 1.2

  • Microservices architecture
  • Go-based proxy service
  • Real-time monitoring
  • Docker deployment support

Made with โค๏ธ for the ethical web scraping community

If you find this project useful, please consider giving it a โญ on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chameleon_engine-1.0.0.tar.gz (207.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chameleon_engine-1.0.0-py3-none-any.whl (179.0 kB view details)

Uploaded Python 3

File details

Details for the file chameleon_engine-1.0.0.tar.gz.

File metadata

  • Download URL: chameleon_engine-1.0.0.tar.gz
  • Upload date:
  • Size: 207.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for chameleon_engine-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4144dae885baa7e3e8c1e858c8e714a68dcb5f84a5f479b3678b6869de8cd30b
MD5 555e62548b1db630c743a7e5a616e05b
BLAKE2b-256 c7d54b93d3d5a357424f7bddcd99075a96460183332545da8dcefd5398a4746d

See more details on using hashes here.

File details

Details for the file chameleon_engine-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for chameleon_engine-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 880168bbb6d84969fec58bc721864f8d515c315144b8fa7f6adc1ef5bc53ca7a
MD5 f3c3d1a04589ff558294336b4034668b
BLAKE2b-256 47f84395f0ddaaa48db1b3510ffaa6d5603276ba12e442fe76767d5dc7259bcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page