Add custom proxy headers to HTTPS requests in Scrapy
Project description
Scrapy Proxy Headers
Send custom headers to proxies and receive proxy response headers in Scrapy.
The Problem
When making HTTPS requests through a proxy, Scrapy cannot send custom headers to the proxy itself. This is because HTTPS requests create an encrypted tunnel (via HTTP CONNECT) - any headers you add to request.headers are encrypted and only visible to the destination server, not the proxy.
┌──────────┐ CONNECT ┌───────┐ Encrypted ┌────────────┐
│ Scrapy │ ───────────────► │ Proxy │ ════════════════► │ Target URL │
└──────────┘ (unencrypted) └───────┘ (tunnel) └────────────┘
│ │
Proxy headers request.headers
go HERE go here (encrypted)
This extension solves the problem by:
- Sending custom headers to the proxy during the CONNECT handshake
- Capturing response headers from the proxy's CONNECT response
- Making those headers available in your spider
Installation
pip install scrapy-proxy-headers
Quick Start
1. Configure the Download Handler
In your Scrapy settings.py:
DOWNLOAD_HANDLERS = {
"https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
}
Or in your spider's custom_settings:
class MySpider(scrapy.Spider):
custom_settings = {
"DOWNLOAD_HANDLERS": {
"https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
}
}
2. Send Proxy Headers
Use request.meta["proxy_headers"] to send headers to the proxy:
import scrapy
class MySpider(scrapy.Spider):
name = "example"
def start_requests(self):
yield scrapy.Request(
url="https://api.ipify.org?format=json",
meta={
"proxy": "http://your-proxy:port",
"proxy_headers": {"X-ProxyMesh-Country": "US"}
}
)
def parse(self, response):
# Proxy response headers are available in response.headers
proxy_ip = response.headers.get("X-ProxyMesh-IP")
self.logger.info(f"Proxy IP: {proxy_ip}")
3. Receive Proxy Response Headers
Headers from the proxy's CONNECT response are automatically merged into response.headers:
def parse(self, response):
# Access headers sent by the proxy
proxy_ip = response.headers.get(b"X-ProxyMesh-IP")
if proxy_ip:
print(f"Request made through IP: {proxy_ip.decode()}")
Complete Example
import scrapy
class ProxyHeadersSpider(scrapy.Spider):
name = "proxy_headers_demo"
custom_settings = {
"DOWNLOAD_HANDLERS": {
"https": "scrapy_proxy_headers.HTTP11ProxyDownloadHandler"
}
}
def start_requests(self):
yield scrapy.Request(
url="https://api.ipify.org?format=json",
meta={
"proxy": "http://us.proxymesh.com:31280",
"proxy_headers": {"X-ProxyMesh-Country": "US"}
},
callback=self.parse_ip
)
def parse_ip(self, response):
data = response.json()
proxy_ip = response.headers.get(b"X-ProxyMesh-IP")
self.logger.info(f"Public IP: {data['ip']}")
if proxy_ip:
self.logger.info(f"Proxy IP: {proxy_ip.decode()}")
yield {
"public_ip": data["ip"],
"proxy_ip": proxy_ip.decode() if proxy_ip else None
}
How It Works
- HTTP11ProxyDownloadHandler - Custom download handler that manages proxy header caching
- ScrapyProxyHeadersAgent - Agent that reads
proxy_headersfrom request meta - TunnelingHeadersAgent - Sends custom headers in the CONNECT request
- TunnelingHeadersTCP4ClientEndpoint - Captures proxy response headers from CONNECT response
The handler also caches proxy response headers by proxy URL. This ensures headers remain available even when Scrapy reuses existing tunnel connections for subsequent requests.
Test Harness
A test harness is included to verify proxy header functionality:
# Basic test
PROXY_URL=http://your-proxy:port TEST_URL=https://api.ipify.org python test_proxy_headers.py
# With custom proxy header
PROXY_URL=http://your-proxy:port \
PROXY_HEADER=X-ProxyMesh-IP \
SEND_PROXY_HEADER=X-ProxyMesh-Country \
SEND_PROXY_VALUE=US \
python test_proxy_headers.py
# Verbose output
python test_proxy_headers.py -v
Environment Variables
| Variable | Description | Default |
|---|---|---|
PROXY_URL |
Proxy URL (also checks HTTPS_PROXY) |
Required |
TEST_URL |
URL to request | https://api.ipify.org?format=json |
PROXY_HEADER |
Response header to check for | X-ProxyMesh-IP |
SEND_PROXY_HEADER |
Header name to send to proxy | Optional |
SEND_PROXY_VALUE |
Value for the send header | Optional |
Documentation
Full documentation is available at scrapy-proxy-headers.readthedocs.io.
Use Cases
- Geographic targeting: Send
X-ProxyMesh-Countryto route through specific countries - Session consistency: Request the same IP across multiple requests
- Debugging: Capture proxy response headers to see which IP was assigned
- Load balancing: Use proxy headers to control request distribution
Requirements
- Python 3.8+
- Scrapy 2.0+
License
BSD License - see LICENSE for details.
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrapy_proxy_headers-0.2.0.tar.gz.
File metadata
- Download URL: scrapy_proxy_headers-0.2.0.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
774ae63ebc9c4f2601482bb8d10b2636bdb88e5d1c8ffe0c03ef8fe2ede89226
|
|
| MD5 |
171ca0dc65aa2f04209f2c1d8b6f300b
|
|
| BLAKE2b-256 |
6fda14b2fbecadde594a4bf144463b2d0012e987fdeb6169aadf0370b55b1a7a
|
Provenance
The following attestation bundles were made for scrapy_proxy_headers-0.2.0.tar.gz:
Publisher:
publish.yml on proxymesh/scrapy-proxy-headers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapy_proxy_headers-0.2.0.tar.gz -
Subject digest:
774ae63ebc9c4f2601482bb8d10b2636bdb88e5d1c8ffe0c03ef8fe2ede89226 - Sigstore transparency entry: 1342952872
- Sigstore integration time:
-
Permalink:
proxymesh/scrapy-proxy-headers@c42711759d210ec2203530e5d8bf3db3f668a099 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/proxymesh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c42711759d210ec2203530e5d8bf3db3f668a099 -
Trigger Event:
release
-
Statement type:
File details
Details for the file scrapy_proxy_headers-0.2.0-py3-none-any.whl.
File metadata
- Download URL: scrapy_proxy_headers-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b23b180600d7eca9d69ed6f92dabeaa48439e6a3b07ab25308009e58487fb855
|
|
| MD5 |
adba542aa398b142741438b6d8eec303
|
|
| BLAKE2b-256 |
d675914347849bd01a8972a121d4895a9ad2493cc8474e7214d20ca44a908fc8
|
Provenance
The following attestation bundles were made for scrapy_proxy_headers-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on proxymesh/scrapy-proxy-headers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrapy_proxy_headers-0.2.0-py3-none-any.whl -
Subject digest:
b23b180600d7eca9d69ed6f92dabeaa48439e6a3b07ab25308009e58487fb855 - Sigstore transparency entry: 1342952873
- Sigstore integration time:
-
Permalink:
proxymesh/scrapy-proxy-headers@c42711759d210ec2203530e5d8bf3db3f668a099 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/proxymesh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c42711759d210ec2203530e5d8bf3db3f668a099 -
Trigger Event:
release
-
Statement type: