Skip to main content

Add custom proxy headers to HTTPS requests in Scrapy

Project description

Scrapy Proxy Headers

PyPI version Documentation

Send custom headers to proxies and receive proxy response headers in Scrapy.

The Problem

When making HTTPS requests through a proxy, Scrapy cannot send custom headers to the proxy itself. This is because HTTPS requests create an encrypted tunnel (via HTTP CONNECT) - any headers you add to request.headers are encrypted and only visible to the destination server, not the proxy.

┌──────────┐     CONNECT      ┌───────┐     Encrypted     ┌────────────┐
│  Scrapy  │ ───────────────► │ Proxy │ ════════════════► │ Target URL │
└──────────┘  (unencrypted)   └───────┘    (tunnel)       └────────────┘
                  │                              │
           Proxy headers             request.headers
           go HERE                   go here (encrypted)

This extension solves the problem by:

  1. Sending custom headers to the proxy during the CONNECT handshake
  2. Capturing response headers from the proxy's CONNECT response
  3. Making those headers available in your spider

Installation

pip install scrapy-proxy-headers

Quick Start

1. Configure the Download Handler

In your Scrapy settings.py:

DOWNLOAD_HANDLERS = {
    "https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
}

Or in your spider's custom_settings:

class MySpider(scrapy.Spider):
    custom_settings = {
        "DOWNLOAD_HANDLERS": {
            "https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
        }
    }

2. Send Proxy Headers

Use request.meta["proxy_headers"] to send headers to the proxy:

import scrapy

class MySpider(scrapy.Spider):
    name = "example"
    
    def start_requests(self):
        yield scrapy.Request(
            url="https://api.ipify.org?format=json",
            meta={
                "proxy": "http://your-proxy:port",
                "proxy_headers": {"X-ProxyMesh-Country": "US"}
            }
        )
    
    def parse(self, response):
        # Proxy response headers are available in response.headers
        proxy_ip = response.headers.get("X-ProxyMesh-IP")
        self.logger.info(f"Proxy IP: {proxy_ip}")

3. Receive Proxy Response Headers

Headers from the proxy's CONNECT response are automatically merged into response.headers:

def parse(self, response):
    # Access headers sent by the proxy
    proxy_ip = response.headers.get(b"X-ProxyMesh-IP")
    if proxy_ip:
        print(f"Request made through IP: {proxy_ip.decode()}")

Complete Example

import scrapy

class ProxyHeadersSpider(scrapy.Spider):
    name = "proxy_headers_demo"
    
    custom_settings = {
        "DOWNLOAD_HANDLERS": {
            "https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
        }
    }
    
    def start_requests(self):
        yield scrapy.Request(
            url="https://api.ipify.org?format=json",
            meta={
                "proxy": "http://us.proxymesh.com:31280",
                "proxy_headers": {"X-ProxyMesh-Country": "US"}
            },
            callback=self.parse_ip
        )
    
    def parse_ip(self, response):
        data = response.json()
        proxy_ip = response.headers.get(b"X-ProxyMesh-IP")
        
        self.logger.info(f"Public IP: {data['ip']}")
        if proxy_ip:
            self.logger.info(f"Proxy IP: {proxy_ip.decode()}")
        
        yield {
            "public_ip": data["ip"],
            "proxy_ip": proxy_ip.decode() if proxy_ip else None
        }

How It Works

  1. HTTP11ProxyDownloadHandler - Custom download handler that manages proxy header caching
  2. ScrapyProxyHeadersAgent - Agent that reads proxy_headers from request meta
  3. TunnelingHeadersAgent - Sends custom headers in the CONNECT request
  4. TunnelingHeadersTCP4ClientEndpoint - Captures proxy response headers from CONNECT response

The handler also caches proxy response headers by proxy URL. This ensures headers remain available even when Scrapy reuses existing tunnel connections for subsequent requests.

Test Harness

A test harness is included to verify proxy header functionality:

# Basic test
PROXY_URL=http://your-proxy:port TEST_URL=https://api.ipify.org python test_proxy_headers.py

# With custom proxy header
PROXY_URL=http://your-proxy:port \
PROXY_HEADER=X-ProxyMesh-IP \
SEND_PROXY_HEADER=X-ProxyMesh-Country \
SEND_PROXY_VALUE=US \
python test_proxy_headers.py

# Verbose output
python test_proxy_headers.py -v

Environment Variables

Variable Description Default
PROXY_URL Proxy URL (also checks HTTPS_PROXY) Required
TEST_URL URL to request https://api.ipify.org?format=json
PROXY_HEADER Response header to check for X-ProxyMesh-IP
SEND_PROXY_HEADER Header name to send to proxy Optional
SEND_PROXY_VALUE Value for the send header Optional

Documentation

Full documentation is available at scrapy-proxy-headers.readthedocs.io.

Use Cases

  • Geographic targeting: Send X-ProxyMesh-Country to route through specific countries
  • Session consistency: Request the same IP across multiple requests
  • Debugging: Capture proxy response headers to see which IP was assigned
  • Load balancing: Use proxy headers to control request distribution

Requirements

  • Python 3.8+
  • Scrapy 2.0+

License

BSD License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_proxy_headers-0.2.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_proxy_headers-0.2.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file scrapy_proxy_headers-0.2.0.tar.gz.

File metadata

  • Download URL: scrapy_proxy_headers-0.2.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scrapy_proxy_headers-0.2.0.tar.gz
Algorithm Hash digest
SHA256 774ae63ebc9c4f2601482bb8d10b2636bdb88e5d1c8ffe0c03ef8fe2ede89226
MD5 171ca0dc65aa2f04209f2c1d8b6f300b
BLAKE2b-256 6fda14b2fbecadde594a4bf144463b2d0012e987fdeb6169aadf0370b55b1a7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_proxy_headers-0.2.0.tar.gz:

Publisher: publish.yml on proxymesh/scrapy-proxy-headers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrapy_proxy_headers-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrapy_proxy_headers-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b23b180600d7eca9d69ed6f92dabeaa48439e6a3b07ab25308009e58487fb855
MD5 adba542aa398b142741438b6d8eec303
BLAKE2b-256 d675914347849bd01a8972a121d4895a9ad2493cc8474e7214d20ca44a908fc8

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrapy_proxy_headers-0.2.0-py3-none-any.whl:

Publisher: publish.yml on proxymesh/scrapy-proxy-headers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page