Advanced stealth web scraping framework with browser fingerprinting and network obfuscation
Project description
๐ฆ Chameleon Engine
Advanced stealth web scraping framework with cutting-edge browser fingerprinting and network obfuscation capabilities.
Chameleon Engine is a comprehensive microservices-based solution designed to bypass modern anti-bot detection systems through sophisticated browser fingerprinting, TLS fingerprint masking, and human behavior simulation.
โจ Key Features
๐ญ Advanced Browser Fingerprinting
- Dynamic Profile Generation: Create realistic browser profiles based on real-world data
- TLS Fingerprint Masking: JA3/JA4 hash manipulation with uTLS integration
- HTTP/2 Header Rewriting: Sophisticated header manipulation for advanced stealth
- Multi-Browser Support: Chrome, Firefox, Safari, Edge fingerprint profiles
๐ Microservices Architecture
- Fingerprint Service: FastAPI-based profile management (Python)
- Proxy Service: High-performance proxy with TLS fingerprinting (Go)
- Data Collection Pipeline: Automated real-world fingerprint gathering
- Real-time Monitoring: WebSocket-based dashboard and metrics
๐ฏ Human Behavior Simulation
- Mouse Movement Patterns: Bezier curve-based natural movements
- Typing Simulation: Realistic typing with variable speed and errors
- Scrolling Behavior: Natural scroll patterns and pauses
- Timing Obfuscation: Human-like delays and interaction patterns
๐ก๏ธ Network Obfuscation
- Advanced Proxy Management: Multi-format proxy loading (TXT, CSV, JSON) with automatic rotation
- Proxy Generation: Dynamic generation of residential, datacenter, and geo-targeted proxies
- Request Obfuscation: Timing and header randomization
- TLS Certificate Generation: Dynamic cert creation per profile
- HTTP/2 Settings Manipulation: Protocol-level fingerprinting
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Python App โ โ Fingerprint โ โ Data Source โ
โ โโโโโบโ Service โโโโโบโ Collection โ
โ Chameleon โ โ (FastAPI) โ โ Pipeline โ
โ Engine โ โ โ โ โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Browser โ โ Proxy โ โ Database โ
โ Management โ โ Service โ โ PostgreSQL โ
โ (Playwright) โโโโโบโ (Go) โโโโโบโ + Redis โ
โ โ โ uTLS + HTTP2 โ โ โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
๐ Quick Start
๐ฏ Automated Installation (Recommended)
Linux/macOS:
# Clone and install with one command
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine
./install.sh
# Start services
docker-compose -f examples/docker_compose_example.yaml up -d
# Run your first scrape
python examples/simple_scrape.py https://example.com
Windows:
# Clone and install
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine
.\install.ps1
# Start services
docker-compose -f examples/docker_compose_example.yaml up -d
# Run your first scrape
python examples/simple_scrape.py https://example.com
๐ Prerequisites
- Python 3.8+
- Go 1.21+ (for proxy service)
- Docker & Docker Compose (optional, for easy deployment)
- PostgreSQL (optional, for persistent storage)
- Redis (optional, for caching)
๐ง Manual Installation
# Clone the repository
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine
# Install Python package in development mode
pip install -e .
# Install Playwright browsers
playwright install
# Install Go dependencies (proxy service)
cd proxy_service
go mod tidy
cd ..
Basic Usage
import asyncio
from chameleon_engine import ChameleonEngine
async def main():
# Initialize Chameleon Engine
engine = ChameleonEngine(
fingerprint_service_url="http://localhost:8000",
proxy_service_url="http://localhost:8080"
)
await engine.initialize()
# Create stealth browser session
browser = await engine.create_browser(
profile_type="chrome_windows",
stealth_mode=True
)
# Perform scraping
page = await browser.new_page()
await page.goto("https://example.com")
content = await page.content()
print(f"Scraped content length: {len(content)}")
# Cleanup
await browser.close()
await engine.cleanup()
asyncio.run(main())
๐ Services Setup
Option 1: Manual Setup
-
Start Fingerprint Service:
python -m chameleon_engine.fingerprint.main
-
Start Proxy Service:
cd proxy_service make run
-
Run Your Application:
python your_scraping_script.py
Option 2: Docker Deployment
# Start all services
docker-compose -f examples/docker_compose_example.yaml up -d
# Check service status
docker-compose ps
๐ฏ Use Cases
E-commerce Data Collection
# Scrape product pages while avoiding bot detection
await engine.scrape_ecommerce(
target_urls=["https://shop.example.com/products/*"],
rotate_fingerprints=True,
human_behavior=True,
rate_limit="1-3 requests per minute"
)
Market Research
# Collect competitive intelligence
await engine.market_research(
competitors=["competitor1.com", "competitor2.com"],
data_types=["pricing", "products", "reviews"],
stealth_level="high"
)
SEO Monitoring
# Monitor search engine rankings
await engine.seo_monitoring(
keywords=["python web scraping"],
search_engines=["google", "bing"],
geo_locations=["US", "UK", "DE"]
)
Academic Research
# Collect data for research purposes
await engine.academic_research(
target_sites=["scholar.google.com", "arxiv.org"],
data_types=["papers", "citations", "metadata"],
ethical_scraping=True
)
๐ง Configuration
Environment Variables
# Fingerprint Service
export DATABASE_URL="postgresql://user:pass@localhost/chameleon"
export REDIS_URL="redis://localhost:6379"
export LOG_LEVEL="info"
# Proxy Service
export FINGERPRINT_SERVICE_URL="http://localhost:8000"
export TLS_ENABLED="false"
export PROXY_TARGET_HOST=""
Configuration File
Create chameleon_config.yaml:
fingerprint:
service_url: "http://localhost:8000"
cache_size: 1000
rotation_interval: 300
proxy:
service_url: "http://localhost:8080"
upstream_proxies:
- url: "http://proxy1.example.com:8080"
auth:
username: "user"
password: "pass"
type: "basic"
- url: "http://proxy2.example.com:8080"
weight: 2
auth: null
rotation_settings:
strategy: "round_robin"
interval: 300
request_count: 100
health_check:
enabled: true
interval: 60
behavior:
mouse_movements: true
typing_patterns: true
human_delays: true
logging:
level: "info"
format: "json"
Proxy Configuration Details
The Go proxy service manages upstream proxies in two ways:
-
No Upstream Proxies (Default):
proxy: service_url: "http://localhost:8080" upstream_proxies: []
Flow: Your App โ Go Proxy Service โ Target Website
-
With Upstream Proxies:
proxy: service_url: "http://localhost:8080" upstream_proxies: - url: "http://proxy1.example.com:8080" auth: username: "user" password: "pass" type: "basic" - url: "http://proxy2.example.com:8080" weight: 2
Flow: Your App โ Go Proxy Service โ External Proxy โ Target Website
See Proxy Management Guide for detailed configuration.
Advanced Proxy Loading
Chameleon Engine supports multiple proxy loading methods:
from chameleon_engine.proxy_loader import ProxyLoader
loader = ProxyLoader()
# Load from text files
proxies = loader.load_from_txt("proxies.txt", format_type="mixed")
# Load from CSV
proxies = loader.load_from_csv("proxies.csv")
# Generate dynamic proxies
residential_proxies = loader.generate_proxies(
count=10,
pattern="residential",
geolocations=["US", "EU", "AS"]
)
# Filter proxies
http_proxies = loader.filter_proxies(proxies, protocol="http")
auth_proxies = loader.filter_proxies(proxies, has_auth=True)
See Proxy Usage Guide for comprehensive examples.
๐ฆ Installation Options
๐ Detailed Installation Guide
See INSTALL.md for comprehensive installation instructions including:
- System-specific setup (Linux, macOS, Windows)
- Docker installation
- Database configuration
- Troubleshooting common issues
๐ Quick Start Guide
See QUICK_START.md for a streamlined getting started experience.
๐ Monitoring & Debugging
Health Checks
# Check fingerprint service
curl http://localhost:8000/health
# Check proxy service
curl http://localhost:8080/api/v1/health
Real-time Monitoring
# Get live statistics
stats = await engine.get_proxy_stats()
print(f"Active connections: {stats['active_connections']}")
print(f"Total requests: {stats['total_requests']}")
# WebSocket monitoring
import websocket
ws = websocket.WebSocketApp("ws://localhost:8080/ws")
ws.on_message = lambda ws, msg: print(f"Update: {msg}")
ws.run_forever()
API Documentation
- Fingerprint Service: http://localhost:8000/docs
- Proxy Service: http://localhost:8080/api/v1/health
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=chameleon_engine --cov-report=html
# Run specific test suite
pytest tests/test_fingerprint.py -v
๐ Examples
Quick Start Example
python examples/quick_start.py
Advanced Scraping Demo
python examples/advanced_scraping_example.py
Direct API Usage
python examples/api_client_example.py
Proxy Management Examples
# Test proxy loading functionality
python examples/test_proxy_standalone.py
# Run proxy configuration examples
python examples/proxy_loader_examples.py
For more examples, see the examples directory.
๐ Advanced Features
Custom Fingerprint Profiles
# Create custom browser profile
custom_profile = {
"browser_type": "chrome",
"os": "windows",
"version": "120.0.0.0",
"screen_resolution": "1920x1080",
"timezone": "America/New_York",
"language": "en-US",
"custom_headers": {
"X-Custom-Header": "MyValue"
}
}
profile = await fingerprint_client.create_profile(custom_profile)
Behavior Simulation
# Simulate human mouse movements
mouse_path = behavior_simulator.generate_mouse_path(
start=(100, 100),
end=(500, 300),
duration=2.0,
curve_type="bezier"
)
# Simulate typing with natural patterns
typing_pattern = behavior_simulator.generate_typing_pattern(
text="Hello, World!",
wpm=80,
error_rate=0.02
)
Network Obfuscation
# Obfuscate request timing
original_delay = 1.0
obfuscated_delay = network_obfuscator.obfuscate_timing(original_delay)
# Obfuscate headers
headers = {"User-Agent": "Mozilla/5.0..."}
obfuscated_headers = network_obfuscator.obfuscate_headers(headers)
๐ ๏ธ Development
Setting Up Development Environment
# Clone repository
git clone https://github.com/your-org/chameleon-engine.git
cd chameleon-engine
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
Code Quality
# Format code
black chameleon_engine/
isort chameleon_engine/
# Lint code
flake8 chameleon_engine/
mypy chameleon_engine/
# Run security checks
bandit -r chameleon_engine/
Building Documentation
# Install documentation dependencies
pip install -r requirements-docs.txt
# Build docs
mkdocs build
# Serve docs locally
mkdocs serve
๐ Performance
Benchmarks
- Request Processing: < 10ms average latency
- Profile Generation: < 50ms for complex profiles
- Memory Usage: ~50MB base + ~5MB per concurrent session
- Concurrent Sessions: 1000+ simultaneous connections
Optimization Tips
- Enable Redis caching for fingerprint profiles
- Use connection pooling for database connections
- Configure appropriate timeouts for target websites
- Monitor resource usage with built-in metrics
๐ Security Considerations
Ethical Usage
- โ Respect robots.txt files
- โ Implement rate limiting for target websites
- โ Check terms of service before scraping
- โ Identify your bot when required
- โ Don't overload target servers
- โ Don't scrape personal data without consent
- โ Don't bypass security measures illegally
Best Practices
# Ethical scraping configuration
ethical_config = {
"rate_limit": "1 request per second",
"respect_robots_txt": True,
"user_agent": "MyBot/1.0 (+http://mywebsite.com/bot-info)",
"timeout": 30,
"max_retries": 3,
"retry_delay": 5
}
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Workflow
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Add tests for new functionality
- Run the test suite:
pytest - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- uTLS for TLS fingerprinting
- Playwright for browser automation
- FastAPI for the API framework
- Gin for the Go web framework
๐ Support
- ๐ Documentation
- ๐ Issue Tracker
- ๐ฌ Discussions
- ๐ง Email Support
๐บ๏ธ Roadmap
Version 2.0
- Machine learning-based behavior optimization
- Advanced CAPTCHA solving integration
- Cloud deployment templates
- Web-based management dashboard
Version 1.5
- Enhanced mobile browser fingerprinting
- WebGL and Canvas fingerprinting
- Audio fingerprinting capabilities
- Advanced proxy pool management
- Multi-format proxy loading (TXT, CSV, JSON)
- Dynamic proxy generation (residential, datacenter, geo-targeted)
- Comprehensive proxy filtering and validation
Version 1.2
- Microservices architecture
- Go-based proxy service
- Real-time monitoring
- Docker deployment support
Made with โค๏ธ for the ethical web scraping community
If you find this project useful, please consider giving it a โญ on GitHub!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chameleon_engine-1.0.0.tar.gz.
File metadata
- Download URL: chameleon_engine-1.0.0.tar.gz
- Upload date:
- Size: 207.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4144dae885baa7e3e8c1e858c8e714a68dcb5f84a5f479b3678b6869de8cd30b
|
|
| MD5 |
555e62548b1db630c743a7e5a616e05b
|
|
| BLAKE2b-256 |
c7d54b93d3d5a357424f7bddcd99075a96460183332545da8dcefd5398a4746d
|
File details
Details for the file chameleon_engine-1.0.0-py3-none-any.whl.
File metadata
- Download URL: chameleon_engine-1.0.0-py3-none-any.whl
- Upload date:
- Size: 179.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
880168bbb6d84969fec58bc721864f8d515c315144b8fa7f6adc1ef5bc53ca7a
|
|
| MD5 |
f3c3d1a04589ff558294336b4034668b
|
|
| BLAKE2b-256 |
47f84395f0ddaaa48db1b3510ffaa6d5603276ba12e442fe76767d5dc7259bcb
|