Fast HTTP/1.1 parser for Gunicorn using picohttpparser
Project description
gunicorn_h1c
Fast HTTP/1.1 parser for Gunicorn using picohttpparser.
Features
- SIMD-optimized parsing (SSE4.2 on x86, NEON on ARM)
- Zero-copy request parsing with lazy Python object creation
- Callback-based parser for asyncio integration (H1CProtocol)
- Common header extraction (Content-Length, Transfer-Encoding, Connection)
- Incremental parsing support
- Chunked transfer encoding support
- WSGI environ and ASGI scope generation
- Python 3.9+
Installation
pip install gunicorn_h1c
Usage
Basic Parsing
from gunicorn_h1c import parse_request
data = b"GET /path?query=1 HTTP/1.1\r\nHost: localhost\r\nContent-Length: 0\r\n\r\n"
result = parse_request(data)
print(result['method']) # b'GET'
print(result['path']) # b'/path?query=1'
print(result['minor_version']) # 1 (HTTP/1.1)
print(result['headers']) # [(b'Host', b'localhost'), (b'Content-Length', b'0')]
print(result['consumed']) # 67 (bytes consumed)
Fast Parsing (Zero-Copy)
from gunicorn_h1c import parse_request_fast
data = b"POST /api HTTP/1.1\r\nContent-Length: 100\r\nTransfer-Encoding: chunked\r\n\r\n"
req = parse_request_fast(data)
# Properties are created lazily - only when accessed
print(req.method) # b'POST'
print(req.path) # b'/api'
print(req.consumed) # bytes consumed
# Common headers extracted during parse (no Python overhead)
print(req.content_length) # 100
print(req.has_chunked) # True
print(req.connection_close) # -1 (not set), 0 (keep-alive), 1 (close)
# Header lookup (case-insensitive)
print(req.get_header("content-length")) # b'100'
Response Parsing
from gunicorn_h1c import parse_response
data = b"HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: 13\r\n\r\n"
result = parse_response(data)
print(result['status']) # 200
print(result['message']) # b'OK'
print(result['minor_version']) # 1
print(result['headers']) # [(b'Content-Type', b'text/html'), ...]
print(result['consumed']) # bytes consumed
Header-Only Parsing
from gunicorn_h1c import parse_headers
data = b"Content-Type: text/html\r\nContent-Length: 100\r\n\r\n"
headers = parse_headers(data)
print(headers) # [(b'Content-Type', b'text/html'), (b'Content-Length', b'100')]
WSGI Environ Creation
from gunicorn_h1c import parse_to_wsgi_environ
data = b"GET /path?foo=bar HTTP/1.1\r\nHost: example.com\r\nContent-Type: text/plain\r\n\r\n"
environ = parse_to_wsgi_environ(
data,
server=("example.com", 80),
client=("192.168.1.1", 54321),
url_scheme="https"
)
print(environ['REQUEST_METHOD']) # 'GET'
print(environ['PATH_INFO']) # '/path'
print(environ['QUERY_STRING']) # 'foo=bar'
print(environ['SERVER_NAME']) # 'example.com'
print(environ['SERVER_PORT']) # '80'
print(environ['REMOTE_ADDR']) # '192.168.1.1'
print(environ['HTTP_HOST']) # 'example.com'
print(environ['CONTENT_TYPE']) # 'text/plain'
print(environ['wsgi.url_scheme']) # 'https'
print(environ['_consumed']) # bytes consumed
ASGI Scope Creation
from gunicorn_h1c import parse_to_asgi_scope
data = b"POST /api HTTP/1.1\r\nHost: example.com\r\nContent-Length: 50\r\n\r\n"
scope = parse_to_asgi_scope(
data,
server=("example.com", 443),
client=("10.0.0.1", 12345),
scheme="https",
root_path="/v1"
)
print(scope['type']) # 'http'
print(scope['asgi']) # {'version': '3.0', 'spec_version': '2.4'}
print(scope['http_version']) # '1.1'
print(scope['method']) # 'POST'
print(scope['scheme']) # 'https'
print(scope['path']) # '/api'
print(scope['raw_path']) # b'/api'
print(scope['query_string']) # b''
print(scope['root_path']) # '/v1'
print(scope['headers']) # [(b'host', b'example.com'), ...]
print(scope['server']) # ('example.com', 443)
print(scope['client']) # ('10.0.0.1', 12345)
print(scope['_consumed']) # bytes consumed
Callback-Based Protocol Parser (asyncio)
For asyncio servers, H1CProtocol provides a callback-based API that enables zero-copy,
synchronous parsing in data_received():
import asyncio
from gunicorn_h1c import H1CProtocol
class MyProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
self.parser = H1CProtocol(
on_headers_complete=self._on_headers,
on_body=self._on_body,
on_message_complete=self._on_complete,
)
def data_received(self, data):
try:
self.parser.feed(data)
except ParseError as e:
self.transport.close()
def _on_headers(self):
# Build ASGI scope or process headers
method = self.parser.method # b'GET'
path = self.parser.path # b'/path'
headers = self.parser.headers # [(b'Host', b'localhost'), ...]
# Return True to skip body parsing (e.g., for HEAD requests)
return self.parser.method == b"HEAD"
def _on_body(self, chunk):
# Process body chunk (zero-copy)
pass
def _on_complete(self):
# Request complete, send response
self.parser.reset() # Reuse for next request (keep-alive)
Incremental Parsing
from gunicorn_h1c import parse_request, IncompleteError
buffer = b"GET / HTTP/1.1\r\n"
last_len = 0
while True:
try:
result = parse_request(buffer, last_len=last_len)
break # Complete request
except IncompleteError:
last_len = len(buffer)
buffer += read_more_data() # Get more data
Raw Parsing (Maximum Speed)
For scenarios requiring maximum performance, parse_request_raw returns offsets into the original buffer:
from gunicorn_h1c import parse_request_raw
data = b"GET /path HTTP/1.1\r\nHost: localhost\r\n\r\n"
result = parse_request_raw(data)
# Returns: (method_offset, method_len, path_offset, path_len,
# minor_version, header_count, consumed, header_data)
method_offset, method_len, path_offset, path_len, version, header_count, consumed, header_data = result
method = data[method_offset:method_offset + method_len] # b'GET'
path = data[path_offset:path_offset + path_len] # b'/path'
Performance
Benchmarks on Apple M4 Pro (single thread):
| Parser | Requests/sec |
|---|---|
| gunicorn_h1c (fast) | ~2,500,000 |
| gunicorn_h1c (H1CProtocol, reused) | ~4,700,000 |
| httptools | ~2,200,000 |
| Pure Python | ~150,000 |
H1CProtocol Performance:
- Simple GET: ~4.7M req/s (209ns/op) when reusing parser
- Incremental parsing: ~3x faster than pull-based API with buffer + retry
- Body parsing: ~3.0M req/s for chunked, ~3.7M req/s for Content-Length
API Reference
Request Parsing
parse_request(data, last_len=0) -> dict
Parse HTTP request, returns dict with:
method: bytespath: bytesminor_version: int (0 or 1)headers: list of (name, value) tuplesconsumed: int (bytes consumed)
parse_request_fast(data, last_len=0) -> HttpRequest
Parse HTTP request with zero-copy optimization, returns HttpRequest object with:
method: bytes (lazy)path: bytes (lazy)minor_version: intheaders: tuple of (name, value) tuples (lazy)consumed: intheader_count: intcontent_length: int (-1 if not set)has_chunked: boolconnection_close: int (-1=unset, 0=keep-alive, 1=close)get_header(name): bytes or None (case-insensitive lookup)
parse_request_raw(data, last_len=0) -> tuple
Ultra-fast parsing returning raw offsets:
method_offset: intmethod_len: intpath_offset: intpath_len: intminor_version: intheader_count: intconsumed: intheader_data: bytes (packed header offsets)
Callback-Based Protocol Parser
H1CProtocol
Callback-based HTTP/1.1 parser for asyncio integration.
Constructor:
H1CProtocol(
on_message_begin=None, # () -> None
on_url=None, # (url: bytes) -> None
on_header=None, # (name: bytes, value: bytes) -> None
on_headers_complete=None, # () -> bool (return True to skip body)
on_body=None, # (chunk: bytes) -> None
on_message_complete=None, # () -> None
)
Methods:
feed(data: bytes) -> None: Feed data to parser. Callbacks fire synchronously.reset() -> None: Reset parser for next request (keepalive).get_header(name: bytes) -> bytes | None: Case-insensitive header lookup.
Properties (valid after on_headers_complete):
method: bytes - HTTP method (GET, POST, etc.)path: bytes - Request path including query stringhttp_version: tuple[int, int] - HTTP version as (major, minor)headers: list[tuple[bytes, bytes]] - List of (name, value) tuplescontent_length: int | None - Content-Length value or Noneis_chunked: bool - True if Transfer-Encoding: chunkedshould_keep_alive: bool - True if connection should be kept aliveshould_upgrade: bool - True if Upgrade header presentis_complete: bool - True if message parsing is complete
Response Parsing
parse_response(data, last_len=0) -> dict
Parse HTTP response, returns dict with:
status: int (status code)message: bytes (status message)minor_version: int (0 or 1)headers: list of (name, value) tuplesconsumed: int (bytes consumed)
Header Parsing
parse_headers(data, last_len=0) -> list
Parse HTTP headers only, returns list of (name, value) tuples.
WSGI/ASGI Support
parse_to_wsgi_environ(data, server=None, client=None, url_scheme="http") -> dict
Parse HTTP request and build WSGI environ dict. Parameters:
data: Raw HTTP request bytesserver: (host, port) tuple for SERVER_NAME/SERVER_PORTclient: (addr, port) tuple for REMOTE_ADDR/REMOTE_PORTurl_scheme: URL scheme (default "http")
Returns dict with REQUEST_METHOD, PATH_INFO, QUERY_STRING, SERVER_PROTOCOL, HTTP_* headers, and _consumed.
parse_to_asgi_scope(data, server=None, client=None, scheme="http", root_path="") -> dict
Parse HTTP request and build ASGI scope dict. Parameters:
data: Raw HTTP request bytesserver: (host, port) tupleclient: (addr, port) tuplescheme: URL scheme (default "http")root_path: ASGI root_path (default "")
Returns dict with type, asgi, http_version, method, scheme, path, raw_path, query_string, root_path, headers, server, client, and _consumed.
Exceptions
ParseError: Invalid HTTP request/responseIncompleteError: Need more data (incremental parsing)
License
MIT License (picohttpparser) + Apache 2.0 (Python bindings)
Credits
- picohttpparser by Kazuho Oku et al.
- Python bindings by Benoit Chesneau
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gunicorn_h1c-0.3.0.tar.gz.
File metadata
- Download URL: gunicorn_h1c-0.3.0.tar.gz
- Upload date:
- Size: 33.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b78a39660272fed0c93bbf6826c96e9bcf719a6ac982443dd7dd4e54218bb128
|
|
| MD5 |
f2b769538087919e019c710aabc5fbe2
|
|
| BLAKE2b-256 |
8629c4ac86b783f2345233996cb2da027867b6dc3e02310b9caed1680c97863c
|
File details
Details for the file gunicorn_h1c-0.3.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: gunicorn_h1c-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 58.1 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26e4eea00e6436c277097fb644037580795eb35ca8aeb16046ee52a1b908ee6c
|
|
| MD5 |
c77c66f1a70728ee98515a8f1cd54367
|
|
| BLAKE2b-256 |
02b4b0db0dcb3a4c1b95bf2fb5391cfde1a0bad116a55d2244e4abece2fc8a40
|