Enterprise-grade browser automation with advanced features

These details have not been verified by PyPI

Project links

Project description

Multi-Browser Crawler

A clean, focused browser automation package for web scraping and content extraction.

🎯 Ultra-Clean Architecture

This package provides 4 essential components for browser automation:

BrowserPoolManager: Browser pool management with undetected-chromedriver
ProxyManager: Simple proxy management with Chrome-ready format
DebugPortManager: Thread-safe debug port allocation
BrowserConfig: Clean configuration management

✨ Key Features

Zero Redundancy: Every line serves a purpose
Built-in Features: Image download, API discovery, JS execution
Direct Usage: No unnecessary wrapper layers
Chrome Integration: Undetected-chromedriver for stealth browsing
Proxy Support: Single regex parsing, Chrome-ready format
Session Management: Persistent and non-persistent browsers

📦 Installation

pip install multi-browser-crawler

🚀 Quick Start

import asyncio
from multi_browser_crawler import BrowserPoolManager, BrowserConfig

async def main():
    # Create configuration
    config = BrowserConfig(
        headless=True,
        timeout=30,
        browser_data_dir="tmp/browser-data"
    )

    # Initialize browser pool
    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        # Fetch a webpage
        result = await browser_pool.fetch_html(
            url="https://example.com",
            session_id=None  # Non-persistent browser
        )

        print(f"✅ Success!")
        print(f"   Title: {result.get('title', 'N/A')}")
        print(f"   Load time: {result.get('load_time', 0):.2f}s")
        print(f"   HTML size: {len(result.get('html', ''))} characters")

    finally:
        await browser_pool.shutdown()

if __name__ == "__main__":
    asyncio.run(main())

⚙️ Configuration

Basic Configuration

from multi_browser_crawler import BrowserConfig

config = BrowserConfig(
    headless=True,                    # Run in headless mode
    timeout=30,                       # Page load timeout in seconds
    browser_data_dir="tmp/browsers",  # Browser data directory
    proxy_file_path="proxies.txt",    # Optional proxy file
    min_browsers=1,                   # Minimum browsers in pool
    max_browsers=5,                   # Maximum browsers in pool
    idle_timeout=300,                 # Browser idle timeout (seconds)
    debug_port_start=9222,            # Debug port range start
    debug_port_end=9322,              # Debug port range end
)

Environment Variables

export BROWSER_HEADLESS=true
export BROWSER_TIMEOUT=30
export BROWSER_DATA_DIR="/tmp/browsers"
export PROXY_FILE_PATH="/path/to/proxies.txt"
export MIN_BROWSERS=1
export MAX_BROWSERS=5
export DEBUG_PORT_START=9222
export DEBUG_PORT_END=9322

📝 Proxy File Format

Create a proxies.txt file with one proxy per line:

# Basic proxies
127.0.0.1:8080
192.168.1.100:3128
proxy.example.com:8080

# Proxies with authentication
user:pass@192.168.1.1:3128
admin:secret@proxy.example.com:9999

# Complex passwords (supported)
user:complex@pass@host.com:8080

📚 Usage Examples

1. Basic Web Scraping

import asyncio
from multi_browser_crawler import BrowserPoolManager, BrowserConfig

async def basic_scraping():
    config = BrowserConfig(
        headless=True,
        browser_data_dir="tmp/browser-data"
    )

    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        result = await browser_pool.fetch_html(
            url="https://httpbin.org/html",
            session_id=None
        )

        print(f"Status: {result['status']}")
        print(f"HTML: {result['html'][:100]}...")

    finally:
        await browser_pool.shutdown()

asyncio.run(basic_scraping())

2. Using Proxies

async def proxy_scraping():
    config = BrowserConfig(
        headless=True,
        browser_data_dir="tmp/browser-data",
        proxy_file_path="proxies.txt"  # Use proxy file
    )

    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        result = await browser_pool.fetch_html(
            url="https://httpbin.org/ip",
            session_id=None,
            use_proxy=True  # Enable proxy usage
        )

        print(f"IP: {result['html']}")

    finally:
        await browser_pool.shutdown()

3. Persistent Sessions

async def persistent_session():
    config = BrowserConfig(browser_data_dir="tmp/browser-data")
    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        # First request - set cookie
        result1 = await browser_pool.fetch_html(
            url="https://httpbin.org/cookies/set/test/value123",
            session_id="my_session"  # Persistent session
        )

        # Second request - check cookie (same session)
        result2 = await browser_pool.fetch_html(
            url="https://httpbin.org/cookies",
            session_id="my_session"  # Same session
        )

        print("Cookie persisted between requests!")

    finally:
        await browser_pool.shutdown()

4. JavaScript Execution

async def javascript_execution():
    config = BrowserConfig(browser_data_dir="tmp/browser-data")
    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        result = await browser_pool.fetch_html(
            url="https://httpbin.org/html",
            session_id=None,
            js_action="document.title = 'Modified by JS';"
        )

        print(f"Modified title: {result.get('title')}")

    finally:
        await browser_pool.shutdown()

5. Image Downloading

async def download_images():
    config = BrowserConfig(
        browser_data_dir="tmp/browser-data",
        download_images_dir="tmp/images"
    )

    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        result = await browser_pool.fetch_html(
            url="https://example.com",
            session_id=None,
            download_images=True  # Enable image downloading
        )

        print(f"Downloaded images: {result.get('downloaded_images', [])}")

    finally:
        await browser_pool.shutdown()

6. API Discovery

async def api_discovery():
    config = BrowserConfig(browser_data_dir="tmp/browser-data")
    browser_pool = BrowserPoolManager(config.to_dict())

    try:
        result = await browser_pool.fetch_html(
            url="https://spa-app.example.com",
            session_id=None,
            capture_api_calls=True  # Enable API discovery
        )

        print(f"API calls: {result.get('api_calls', [])}")

    finally:
        await browser_pool.shutdown()

🖥️ CLI Usage

# Fetch a single URL
python -m multi_browser_crawler.browser_cli fetch https://example.com

# Fetch with proxy
python -m multi_browser_crawler.browser_cli fetch https://example.com --proxy-file proxies.txt

# Test proxies
python -m multi_browser_crawler.browser_cli test-proxies proxies.txt

🧪 Testing

Run the test suite:

# Run all tests
python tests/test_browser.py
python tests/test_proxy_manager.py
python tests/test_debug_port_manager.py

# Run usage examples
python examples/01_basic_usage.py
python examples/02_advanced_features.py

📄 License

MIT License - see LICENSE file for details.

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📞 Support

GitHub Issues: Report bugs and request features
Documentation: Coding Principles and Examples
Examples: Comprehensive usage patterns and principle demonstrations

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.5

Sep 4, 2025

0.5.4

Sep 4, 2025

0.5.3

Sep 4, 2025

0.5.2

Sep 4, 2025

0.5.1

Sep 4, 2025

0.5.0

Sep 4, 2025

0.4.0

Sep 3, 2025

0.3.0

Sep 1, 2025

0.2.4

Sep 1, 2025

This version

0.2.0

Aug 20, 2025

0.1.1

Aug 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi_browser_crawler-0.2.0.tar.gz (30.6 kB view details)

Uploaded Aug 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multi_browser_crawler-0.2.0-py3-none-any.whl (27.6 kB view details)

Uploaded Aug 20, 2025 Python 3

File details

Details for the file multi_browser_crawler-0.2.0.tar.gz.

File metadata

Download URL: multi_browser_crawler-0.2.0.tar.gz
Upload date: Aug 20, 2025
Size: 30.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for multi_browser_crawler-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`103879f5c5e383da678f3fc1f80d31a8180bbc1504ca0cd337731aa96d14e0f3`
MD5	`b1696ca9b04a189bde3ef08adf0c4f58`
BLAKE2b-256	`d25427f6b3d62e7c4728bf68df205ceaf162821765b51dceb4c6c8d4a366da0e`

See more details on using hashes here.

File details

Details for the file multi_browser_crawler-0.2.0-py3-none-any.whl.

File metadata

Download URL: multi_browser_crawler-0.2.0-py3-none-any.whl
Upload date: Aug 20, 2025
Size: 27.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for multi_browser_crawler-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`764746bc0619dc1fccfdc143f337b867b403ae8ccd40987756ae6f64df27ffe8`
MD5	`50a0e10df7dd825b37b4dcff3c6ba988`
BLAKE2b-256	`660f79621f80bcbea80174f152d2c36d0026a08353a020336906d156053ae2bd`

See more details on using hashes here.

multi-browser-crawler 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Multi-Browser Crawler

🎯 Ultra-Clean Architecture

✨ Key Features

📦 Installation

🚀 Quick Start

⚙️ Configuration

Basic Configuration

Environment Variables

📝 Proxy File Format

📚 Usage Examples

1. Basic Web Scraping

2. Using Proxies

3. Persistent Sessions

4. JavaScript Execution

5. Image Downloading

6. API Discovery

🖥️ CLI Usage

🧪 Testing

📄 License

🤝 Contributing

📞 Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes