High-performance agentic web scraping library combining curl-cffi speed with Playwright browser capabilities

These details have not been verified by PyPI

Project links

Project description

PhantomFetch

PhantomFetch is a high-performance, agentic web scraping library for Python. It seamlessly combines the speed of curl-cffi with the capabilities of Playwright, offering a unified API for all your data extraction needs.

Why PhantomFetch?

Most web scraping requires choosing between speed (httpx, requests) or browser capabilities (Playwright, Selenium). PhantomFetch gives you both with a unified interface:

Feature	PhantomFetch	requests/httpx	Playwright/Selenium
Speed	⚡ Fast (curl-cffi)	⚡ Fast	🐌 Slow
JavaScript Support	✅ Yes (Playwright)	❌ No	✅ Yes
Anti-Detection	✅ Built-in	❌ No	⚠️ Manual
Smart Caching	✅ Configurable	❌ No	❌ No
Proxy Rotation	✅ Automatic	⚠️ Manual	⚠️ Manual
Async-First	✅ Yes	⚠️ Partial	✅ Yes
Unified API	✅ One interface	N/A	N/A
OpenTelemetry	✅ Built-in	❌ No	❌ No

Key Benefits:

🎯 Start Fast, Scale Smart: Use curl for quick requests, switch to browser when needed
🧠 Intelligent: Automatic retry logic, exponential backoff, fingerprint rotation
🚀 Production-Ready: Built-in observability, caching, and error handling
🛠️ Developer-Friendly: Intuitive API, comprehensive type hints, rich documentation

Features

🚀 Unified API: Switch between curl (fast, lightweight) and browser (JavaScript-capable) engines with a single parameter
🧠 Smart Caching: Configurable caching strategies (all, resources, conservative) to speed up development and save bandwidth
🤖 Agentic Actions: Define browser interactions (click, scroll, input, wait) declaratively
🛡️ Anti-Detection: Built-in support for proxy rotation and fingerprinting protection (via curl-cffi)
⚡ Async First: Built on asyncio for high concurrency
🔄 Smart Retries: Configurable retry logic with exponential backoff
🍪 Cookie Management: Automatic cookie handling across engines
📊 Observability: OpenTelemetry integration out of the box

Installation

pip install phantomfetch
# or with uv (recommended)
uv pip install phantomfetch

After installation, install Playwright browsers:

playwright install chromium

Quick Start

Basic Fetch (Curl Engine)

import asyncio
from phantomfetch import Fetcher

async def main():
    async with Fetcher() as f:
        response = await f.fetch("https://httpbin.org/get")
        print(response.json())

if __name__ == "__main__":
    asyncio.run(main())

Browser Fetch with Caching

Use the resources strategy to cache static assets (images, CSS, scripts) while keeping the main page fresh.

from phantomfetch import Fetcher, FileSystemCache

async def main():
    # Cache sub-resources to speed up subsequent fetches
    cache = FileSystemCache(strategy="resources")

    async with Fetcher(browser_engine="cdp", cache=cache) as f:
        # First run: downloads everything
        resp = await f.fetch("https://example.com", engine="browser")

        # Second run: uses cached resources, only fetches main HTML
        resp = await f.fetch("https://example.com", engine="browser")
        print(resp.text)

Browser Actions

Perform interactions like clicking, scrolling, and taking screenshots:

from phantomfetch import Fetcher

actions = [
    {"action": "wait", "selector": "#search-input"},
    {"action": "input", "selector": "#search-input", "value": "phantomfetch"},
    {"action": "click", "selector": "#search-button"},
    {"action": "wait_for_load"},
    {"action": "screenshot", "value": "search_results.png"}
]

async with Fetcher(browser_engine="cdp") as f:
    resp = await f.fetch("https://example.com", actions=actions, engine="browser")
    # Screenshot saved to search_results.png

Advanced: Retry Configuration

Fine-tune retry behavior per request:

from phantomfetch import Fetcher

async with Fetcher() as f:
    # Custom retry logic for flaky endpoints
    resp = await f.fetch(
        "https://api.example.com/data",
        max_retries=5,  # Override default retries
        timeout=60.0,   # Longer timeout for slow APIs
    )

Browser Benchmark Harness

Run a quick multi-variant browser benchmark (Playwright, CloakBrowser, Camoufox), save screenshots per run, and output a JSON report:

uv run python examples/browser_benchmark.py \
  --url "https://www.google.com/shopping?udm=28" \
  --proxy "http://user:pass@host:port" \
  --runs 2 \
  --output-dir benchmark-artifacts

To include headed variants in the matrix, add --headed (requires a GUI display). By default the matrix includes Playwright, Rebrowser, CloakBrowser, and Camoufox variants (no curl). You can add Browser-as-a-Service checks with --cdp-endpoint ws://.... You can mark HTTP-200 bot-wall pages as failures using block markers:

uv run python examples/browser_benchmark.py \
  --url "https://www.walmart.com/..." \
  --proxy "http://user:pass@proxy:port" \
  --block-text "verify you are human" \
  --block-text "press and hold" \
  --runs 1

Note: captcha is treated as a weak marker because it often appears in normal script payloads. Strong phrases (for example verify you are human) and blocked URLs drive blocked=true.

Tiered fallback mode (stop at first success and learn the winning variant per domain/proxy-group):

uv run python examples/browser_benchmark.py \
  --url "https://www.google.com/shopping?udm=28" \
  --proxy "http://user:pass@ipv6-proxy:port" \
  --proxy-group ipv6 \
  --tiered \
  --attempts-per-tier 2 \
  --memory-file .phantomfetch_winners.json \
  --output-dir benchmark-artifacts

You can also drive explicit tier order/proxies with --tier-policy-file policy.json:

{
  "tiers": [
    {"variants": ["cloakbrowser_humanize_geoip"], "proxies": ["http://ipv6-proxy:port"]},
    {"variants": ["cloakbrowser_humanize_geoip"], "proxies": ["http://resi-proxy:port"]},
    {"variants": ["cdp_playwright_stealth"], "proxies": ["http://ipv6-proxy:port"]},
    {"variants": ["cdp_playwright_stealth"], "proxies": ["http://resi-proxy:port"]}
  ]
}

Cookie Handling

Pass cookies to any engine and retrieve them from the response:

from phantomfetch import Fetcher, Cookie

async with Fetcher() as f:
    # Set cookies
    resp = await f.fetch(
        "https://httpbin.org/cookies",
        cookies={"session_id": "secret_token"}
    )
    print(resp.json())

    # Get cookies (including from redirects)
    resp = await f.fetch("https://httpbin.org/cookies/set/foo/bar")
    for cookie in resp.cookies:
        print(f"{cookie.name}: {cookie.value}")

Configuration

Caching Strategies

all: Caches everything, including the main document. Good for offline development
resources (Default): Caches sub-resources (images, styles, scripts) but fetches the main document fresh. Best for scraping dynamic sites
conservative: Caches only heavy static assets like images and fonts

Example:

from phantomfetch import FileSystemCache, Fetcher

cache = FileSystemCache(
    cache_dir=".cache",
    strategy="resources"
)

async with Fetcher(cache=cache) as f:
    # Resources will be cached automatically
    resp = await f.fetch("https://example.com", engine="browser")

Proxy Rotation

Multiple proxy strategies available:

from phantomfetch import Fetcher, Proxy, ProxyPool

# 1. Define Typed Proxies
proxies = [
    Proxy(
        url="http://user:pass@residential-us.com:8080", 
        location="US", 
        vendor="BrightData",
        proxy_type="residential",
        weight=10
    ),
    Proxy(
        url="http://user:pass@datacenter-de.com:8080", 
        location="DE", 
        vendor="OxyLabs",
        proxy_type="datacenter",
        weight=1
    ),
]

# 2. Create a Smart Pool
pool = ProxyPool(proxies, strategy="geo_match")

async with Fetcher(proxies=pool) as f:
    # Uses US proxy from pool (geo-match)
    await f.fetch("https://google.com", location="US")

    # Uses any available proxy (fallback)
    await f.fetch("https://example.com")
    
    # 3. Explicit Override (Bypass Pool)
    # Useful for debugging or specific routing needs
    await f.fetch(
        "https://httpbin.org/ip", 
        proxy="http://user:pass@specific-proxy:8080"
    )

Observability (OpenTelemetry)

PhantomFetch is fully instrumented with OpenTelemetry:

from phantomfetch.telemetry import configure_telemetry
from opentelemetry.sdk.trace.export import ConsoleSpanExporter

# Setup OTel with custom service name
configure_telemetry(service_name="my-scraper")

async with Fetcher() as f:
    await f.fetch("https://example.com")
    # Spans automatically created and exported

Or use standard OpenTelemetry environment variables:

export OTEL_SERVICE_NAME="my-scraper"
export OTEL_TRACES_EXPORTER="console"
python my_scraper.py

Troubleshooting

Playwright Installation Issues

If you encounter browser-related errors:

# Install all browsers
playwright install

# Or just chromium (recommended)
playwright install chromium

# Check installation
playwright install --help

SSL Certificate Errors

For development/testing, you can disable SSL verification:

# Note: Only use this in development!
async with Fetcher() as f:
    # SSL verification is handled by curl-cffi and Playwright
    # For curl engine, certificates are validated by default
    resp = await f.fetch("https://self-signed.badssl.com/")

Memory Issues with Caching

If cache grows too large:

from phantomfetch import FileSystemCache

cache = FileSystemCache(cache_dir=".cache")

# Manually clear expired entries
cache.clear_expired()

# Or just delete the cache directory
import shutil
shutil.rmtree(".cache", ignore_errors=True)

Browser Engine Not Working

Common issues:

Playwright not installed: Run playwright install chromium
Marimo notebook issues: Browser engines may not work in some notebook environments
Port conflicts: CDP uses random ports, but firewall rules might block them

Debug with:

async with Fetcher(browser_engine="cdp") as f:
    # Enable verbose logging
    import logging
    logging.basicConfig(level=logging.DEBUG)

    resp = await f.fetch("https://example.com", engine="browser")

Rate Limiting / 429 Errors

Use retry configuration and delays:

import asyncio

async with Fetcher(max_retries=5) as f:
    for url in urls:
        resp = await f.fetch(url)
        await asyncio.sleep(1)  # Be nice to servers

Scrapeless Session Recording

When using Scrapeless's CDP endpoint for session recording, PhantomFetch automatically reuses existing browser windows:

async with Fetcher(
    browser_engine="cdp",
    browser_engine_config={
        "cdp_endpoint": "wss://YOUR_SESSION.scrapeless.com/chrome/cdp"
        # use_existing_page=True (default) ensures recording compatibility
    }
) as f:
    # Uses existing window - Scrapeless records this! ✓
    resp = await f.fetch("https://example.com", engine="browser")

Why this matters: Scrapeless can only record a single window. By default (use_existing_page=True), PhantomFetch detects and reuses the existing browser page in your Scrapeless session instead of creating new windows.

To disable (not recommended for recording): Set use_existing_page=False in browser_engine_config.

See examples/scrapeless_cdp_recording.py for a complete example.

Next Steps

Ready to dive deeper? Here's what to explore:

Examples - See retry configuration and advanced patterns
CHANGELOG - See what's new
Contributing Guide - Help improve PhantomFetch

Community & Support

🐛 Found a bug? Open an issue
💡 Have a feature idea? Request a feature
❓ Questions? Start a discussion
📖 Documentation issues? Improve the docs

Contributing

We love contributions! PhantomFetch is built by developers, for developers. Whether you're:

🐛 Fixing bugs
✨ Adding features
📝 Improving documentation
🧪 Writing tests

Check out our Contributing Guide to get started!

Quick Start for Contributors

# Clone and setup
git clone https://github.com/iristech-systems/PhantomFetch.git
cd phantomfetch
uv sync
uv run pre-commit install

# Run tests
uv run pytest

# Make changes and commit
git checkout -b feature/amazing-feature
# ... make changes ...
uv run pre-commit run --all-files
git commit -m "feat: add amazing feature"

License

MIT License - see LICENSE for details.

Acknowledgments

Built on the shoulders of giants:

curl-cffi - Amazing curl bindings with anti-detection
Playwright - Best-in-class browser automation
msgspec - Fast serialization
OpenTelemetry - Observability standard

Special thanks to all contributors who help make PhantomFetch better!

Made with ❤️ for the web scraping community

⭐ Star us on GitHub • 📦 Install from PyPI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.9

Apr 10, 2026

0.5.8

Apr 10, 2026

0.5.7

Apr 10, 2026

0.5.6

Apr 9, 2026

0.5.5

Apr 9, 2026

0.5.4

Apr 1, 2026

0.5.3

Apr 1, 2026

0.5.2

Apr 1, 2026

0.5.1

Mar 26, 2026

0.5.0

Mar 26, 2026

0.4.9

Mar 26, 2026

0.4.8

Mar 25, 2026

0.4.7

Mar 25, 2026

0.4.6

Mar 25, 2026

0.4.5

Mar 11, 2026

0.4.4

Mar 11, 2026

0.4.3

Mar 11, 2026

0.4.2

Mar 11, 2026

0.4.1

Mar 11, 2026

0.4.0

Mar 11, 2026

0.3.6

Feb 7, 2026

0.3.5

Feb 7, 2026

0.3.4

Feb 7, 2026

0.3.3

Feb 7, 2026

0.3.2

Feb 7, 2026

0.3.1

Jan 29, 2026

0.3.0

Jan 29, 2026

0.2.3

Jan 20, 2026

0.2.2

Jan 19, 2026

0.2.1

Jan 14, 2026

0.2.0

Jan 10, 2026

0.1.4

Jan 9, 2026

0.1.3

Dec 25, 2025

0.1.2

Dec 6, 2025

0.1.1

Dec 5, 2025

0.1.0

Dec 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantomfetch-0.5.9.tar.gz (54.4 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phantomfetch-0.5.9-py3-none-any.whl (60.8 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file phantomfetch-0.5.9.tar.gz.

File metadata

Download URL: phantomfetch-0.5.9.tar.gz
Upload date: Apr 10, 2026
Size: 54.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for phantomfetch-0.5.9.tar.gz
Algorithm	Hash digest
SHA256	`618d3ef41ec1e7d6e4bfd378018da9c4149ebb3ded91bf87cfe97b99b5759d88`
MD5	`f5be30c8789c79530e97c9b9218bf2e1`
BLAKE2b-256	`7ec3191b132cbbe877897ad024562d47fd633f310e3c192d090f1523ee4690fe`

See more details on using hashes here.

File details

Details for the file phantomfetch-0.5.9-py3-none-any.whl.

File metadata

Download URL: phantomfetch-0.5.9-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 60.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for phantomfetch-0.5.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d320b13fe1d85d43266ef633488e3e92f1119065ddd2520d7d9b214c8410adc3`
MD5	`a804869c87298c11cab09d2e510cb53f`
BLAKE2b-256	`80a68a8edd6c408f9b50b4db87f657b0a77e15ce604c86ee87a99d9fbc3c3903`

See more details on using hashes here.

phantomfetch 0.5.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PhantomFetch

Why PhantomFetch?

Features

Installation

Quick Start

Basic Fetch (Curl Engine)

Browser Fetch with Caching

Browser Actions

Advanced: Retry Configuration

Browser Benchmark Harness

Cookie Handling

Configuration

Caching Strategies

Proxy Rotation

Observability (OpenTelemetry)

Troubleshooting

Playwright Installation Issues

SSL Certificate Errors

Memory Issues with Caching

Browser Engine Not Working

Rate Limiting / 429 Errors

Scrapeless Session Recording

Next Steps

Community & Support

Contributing

Quick Start for Contributors

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes