Skip to main content

High-performance QUIC/HTTP3 library — picoquic-backed, qh3-compatible asyncio API

Project description

aiopquic - Async QUIC + WebTransport (picoquic)

aiopquic is a Python/Cython binding to picoquic, providing high-performance QUIC transport and WebTransport for asyncio applications.

Overview

aiopquic exposes picoquic's QUIC implementation through a lock-free SPSC ring buffer architecture that bridges the picoquic network thread with Python's asyncio event loop. It provides an asyncio QUIC/HTTP3 transport API in the spirit of aioquic (and its fork qh3) — similar shapes for QuicConfiguration, QuicConnection, connect / serve, and event types — plus a native WebTransport client/server layered on picoquic's H3 + h3zero. Not a drop-in replacement: semantics differ around backpressure (send_stream_data raises BufferError on full per-stream ring) and flow-control sizing.

Architecture

  • SPSC Ring Buffers -- Lock-free single producer/single consumer rings for event passing between threads, separate TX and RX rings per TransportContext.
  • TX path -- Asyncio pushes into per-stream byte ring; picoquic pulls at wire rate via prepare_to_send.
  • RX path -- picoquic pushes per-event StreamChunks; ownership transfers at pop for 1-copy delivery.
  • Cross-platform wake fd -- Linux eventfd for efficient asyncio add_reader() notification; pipe() self-pipe fallback on macOS / BSD.
  • Dedicated Network Thread -- picoquic runs in its own thread via picoquic_start_network_thread(). One worker thread per TransportContext; multiple contexts share the asyncio event loop within a single Python process.
  • Cython Bridge -- Thin Cython layer over C callbacks, minimal overhead.
  • WebTransport -- asyncio.webtransport.WebTransportSession (client + server) over picoquic's picowt_* API and h3zero.

Features

  • QUIC client and server: connect, serve, QuicConnectionProtocol
  • Stream data send/receive with FIN signaling, stream reset, stop_sending
  • WebTransport client + server: serve_webtransport, WebTransportSession
  • QUIC datagram TX + RX (note: WebTransport datagram TX not yet wired)
  • Connection migration / 0-RTT (inherited from picoquic)
  • Connection management: create, close, idle timeout, application close codes
  • Per-cnx multiplexing on the server side via QuicEngine
  • TLS keylog (NSS Key Log Format) for pcap decryption
  • Native picoquic_ct / picohttp_ct subprocess smoke (catches upstream regressions on every submodule update)

Test Results

Tests pass on Linux and macOS. The interop suite is opt-in (network-dependent).

Suite Coverage
test_spsc_ring per-event malloc ring lifecycle
test_buffer Cython Buffer
test_transport Transport lifecycle, wake fd, wake-up, connection management
test_loopback 17 tests — handshake, streams, FIN, reset, datagrams, ALPN mismatch, idle timeout, app-close codes, stop_sending, many-streams stress, TX-ring overflow
test_asyncio client/server stream + datagram exchange via connect / serve
test_baton_pattern Pure-QUIC baton-style stream multiplexing (UNI ↔ BIDI)
test_native_picoquic picoquic_ct / picohttp_ct subprocess driver
test_interop Real public endpoints (opt-in)
tests/bench/ microbenches: ring push/pop, single-shot/sustained/parallel/bidirectional throughput, datagrams, RTT latency, handshake rate, byte-verifying object stress + stream churn + concurrent streams (opt-in via pytest tests/bench)

Performance

Sustained single-stream throughput, 30s steady-state, byte-verifying, high-level asyncio API (QuicConnection.send_stream_data):

platform 1 KiB 4 KiB 16 KiB
AMD Ryzen 7 PRO 7840U / WSL2 / Linux 6.6 1,570 Mbps 2,118 Mbps 2,031 Mbps
Apple M-series / macOS Sonoma 953 Mbps 1,130 Mbps 1,104 Mbps

These are over local UDP loopback at the QUIC default MTU (~1,400 B). The realistic ceiling at that MTU is the kernel's per-syscall sendmsg rate, not bandwidth. On Ryzen WSL2, raw iperf3 -u -l 1400 over loopback maxes at 3.15 Gbps (≈ 280 K syscalls/s); raise the datagram size and it climbs cleanly — 4 KiB → 7.9, 8 KiB → 12.8, 32 KiB → 33.7 Gbps. So QUIC pinned at MTU is in a regime where the syscall rate is the wall.

In that regime, here's where the layers land on Ryzen WSL2:

layer ss_mbps of UDP@1400 ceiling
iperf3 -u -l 1400 (raw UDP loopback) 3,150 100 %
picoquicdemo -a perf (picoquic over UDP) 2,184 69 %
aiopquic lowlevel (SPSC ring + UDP) 2,322 74 %
aiopquic highlevel (asyncio + SPSC + UDP) 2,031 64 %
sim_link_bench (picoquic only, no kernel UDP) 11,216 (off-axis)

The asyncio wrapper costs ~10 % below the lowlevel SPSC path; picoquic's own QUIC framing/encryption/ACK overhead accounts for ~25 % vs raw UDP. Both are normal for QUIC-over-loopback at MTU.

sim_link_bench (tests/bench/sim_link/) drives picoquic over its picoquictest_sim_link simulated link — packets are routed in-process between two picoquic_quic_t instances, no kernel UDP, no sockets, no syscall-rate ceiling. It isolates picoquic protocol CPU cost from the loopback wall and is platform-independent. The 11.2 Gbps number above is what picoquic can do without any kernel involvement on this hardware. Build with ./tests/bench/sim_link/build.sh after ./build_picoquic.sh.

Calibrate on your own hardware:

# UDP-over-loopback path (what aiopquic users actually see)
pytest tests/bench/bench_baselines_highlevel.py -s -v          # 30s default
pytest tests/bench/bench_baselines_highlevel.py -s -v --duration=60

# Protocol-only reference (no kernel UDP)
PICOQUIC_SOLUTION_DIR=third_party/picoquic/ \
    tests/bench/sim_link/sim_link_bench --duration-s 30 --rate-gbps 100

Microbenches (ring lifecycle, stream churn, concurrent-streams short bursts) live under tests/bench/ for development reference. Their reported numbers are not representative of sustained throughput — short windows inflate numbers from warmup transients (a 100-stream churn case at 256 B per stream measures ~1 ms of work, dominated by setup cost).

Installation

Wheels for cp312 / cp313 / cp314 on Linux (manylinux_2_34, glibc 2.34+) and macOS arm64 are published to PyPI:

uv pip install aiopquic     # or: pip install aiopquic

For older Linux (glibc 2.28–2.33) install via sdist; build toolchain required.

From source

git clone https://github.com/gmarzot/aiopquic.git
cd aiopquic
git submodule update --init --recursive
./bootstrap_python.sh    # creates .venv with uv-managed Python 3.14 (GIL build) and pins cython 3.2+
source .venv/bin/activate
./build_picoquic.sh      # builds picotls, picoquic, native test drivers
uv pip install -e '.[dev]'    # or: pip install -e '.[dev]'

On macOS, set OPENSSL_ROOT_DIR if Homebrew OpenSSL is not auto-detected (the build script tries openssl@3 then openssl@1.1).

Reporting issues

Include the full version report in any issue — it captures aiopquic plus the picoquic + picotls submodule SHAs the binding was built from:

python -m aiopquic.versions   # or the console script: aiopquic-versions

Sample output:

aiopquic 0.3.5.dev4+g2ffe8947d.d20260522
         /path/to/aiopquic
picoquic 2b1e14d5a46532eadf691edef5bd747da6de6557
picotls  f350eab60742138ac62b42ee444adf04c7898b0d

If you're running aiomoqt on top, prefer python -m aiomoqt.versions — it chains through to this report and includes the aiomoqt version too.

Usage

Low-level Transport API

from aiopquic._binding._transport import TransportContext

server = TransportContext()
server.start(port=4433, cert_file="cert.pem", key_file="key.pem", alpn="moq-00", is_client=False)

client = TransportContext()
client.start(port=0, alpn="moq-00", is_client=True)
client.create_client_connection("127.0.0.1", 4433, sni="localhost", alpn="moq-00")

Asyncio API

from aiopquic.asyncio.client import connect
from aiopquic.quic.configuration import QuicConfiguration

configuration = QuicConfiguration(alpn_protocols=["myproto"], is_client=True)

async with connect("server", 4433, configuration=configuration) as protocol:
    quic = protocol._quic
    stream_id = quic.get_next_available_stream_id()
    quic.send_stream_data(stream_id, payload, end_stream=True)
    protocol.transmit()

payload is opaque bytes; the library doesn't impose framing. Consumers that want HTTP/3 layer on top of aiopquic's picowt-backed h3zero plumbing; consumers that want WebTransport use serve_webtransport / connect_webtransport. Most direct users of the asyncio API ship their own protocol bytes (MoQT, custom binary frames, etc.).

WebTransport

from aiopquic.asyncio.webtransport import (
    serve_webtransport, WebTransportSession,
)
# See src/aiopquic/asyncio/webtransport.py and tests/ for full examples.

Development

uv pip install -e '.[dev]'    # or: pip install -e '.[dev]'
python -m pytest tests/ -v -m "not interop and not native"

# Microbenches (opt-in)
python -m pytest tests/bench

Performance build (opt-in)

Default builds use CMAKE_BUILD_TYPE=Release (-O3 -DNDEBUG), portable across hosts. Two opt-in env vars layer on host-tuned optimizations for local benching — neither is enabled in PyPI wheels:

# Host-tuned: Fusion AES-GCM (x86_64), DISABLE_DEBUG_PRINTF,
# -O3 -march=native -flto. Binary becomes machine-specific.
AIOPQUIC_PERF=1 ./build_picoquic.sh

Per-platform behavior:

Knob Linux x86_64 Linux ARM64 macOS arm64 macOS x86_64
-O3 -DNDEBUG (always on)
DISABLE_DEBUG_PRINTF
Fusion AES-GCM (CPUID-dispatched)
-march=native / -mcpu=native + -flto

Experimental: AIOPQUIC_IO_URING=1 (DORMANT)

io_uring scaffolding is in the tree (third_party/liburing submodule, picoquic patch, setup.py linkage). Enabling it builds picoquic_packet_loop_uring into libpicoquic-core.a and statically links liburing.a into the Cython extension:

AIOPQUIC_IO_URING=1 ./build_picoquic.sh   # auto-fetches + builds liburing-2.7
uv pip install -e '.[dev]'                 # re-cythonize with PICOQUIC_WITH_IO_URING define

This currently has no runtime effect. aiopquic's worker thread uses its own callback/SPSC-ring path and does not invoke picoquic_packet_loop_uring. The scaffolding is preserved so the worker can be migrated to io_uring later without re-discovering the build recipe (liburing submodule pin, picoquic header patch for kernel-uapi conflicts, ABI-critical define propagation through setup.py).

Linux-only. Compatible CPU architectures: x86_64, ARM64. Build will hard-error if AIOPQUIC_IO_URING=1 is set on macOS / BSD / Windows.

ABI note: picoquic_network_thread_ctx_t and picoquic_socket_ctx_t have conditional fields gated on PICOQUIC_WITH_IO_URING. The build-script + setup.py propagate the define to both picoquic-core and the Cython extension. A mismatch silently shifts thread_is_ready and other field offsets — the network thread appears to never become ready. Don't enable WITH_IO_URING in picoquic without also defining PICOQUIC_WITH_IO_URING in the Cython build.

Runtime deployment guidance

These are runtime tunings, separate from build-time flags above. PyPI wheels ship with portable perf flags baked in (see Performance build); these knobs apply on top of any binary.

jemalloc for tail-latency reduction

The default glibc allocator's per-thread arenas + occasional coalescing show up as max-latency outliers under sustained high-throughput workloads. Preloading jemalloc measurably tightens the tail:

# Debian/Ubuntu:
sudo apt install libjemalloc2
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 python -m your_app

# Fedora/RHEL:
sudo dnf install jemalloc
LD_PRELOAD=/usr/lib64/libjemalloc.so.2 python -m your_app

Validated improvement on a representative aiopquic sustained workload (Ryzen 7 PRO, Linux loopback): sd 7.1 ms → 4.3 ms, max 437 ms → 310 ms, throughput unchanged. Effect is most visible at multi-Gbps over 60+ second runs; small workloads see no difference.

GSO and send-length-max

GSO (UDP segmentation offload) is already enabled by default on Linux with send_length_max=65535 (max kernel-coalesced stride). No user action needed. macOS / FreeBSD default to GSO off — picoquic's per-datagram sendmsg path is used instead. Env overrides:

AIOPQUIC_GSO=0                  # force off (diagnostic only)
AIOPQUIC_SEND_LENGTH_MAX=8192   # cap kernel-coalesced buffer (Linux GSO on)

TX wake threshold

The TX SPSC event ring's drain-wake threshold defaults to 50% — producer is signalled to resume only after ≥ half the queued events have drained. Overridable to tune for latency vs. context-switch overhead:

AIOPQUIC_TX_RING_WAKE_PCT=25    # wake earlier (lower per-send latency, more context switches)
AIOPQUIC_TX_RING_WAKE_PCT=75    # wake later (more batching, slightly higher latency)

Known Limitations

  • Free-threaded Python (3.14t) not yet supported -- the TX-ring producer side, TransportContext lifecycle, and the WebTransport engine state currently rely on the GIL for serialization. FT support deferred until a per-context locking audit lands.
  • STOP_SENDING error codes surface as 0 today: picoquic's public stream-error getter only returns the RESET_STREAM code. STOP_SENDING's code lives in stream->remote_stop_error in picoquic_internal.h (no public getter). A small helper that pulls the field is straightforward future work — see TODO in src/aiopquic/_binding/c/callback.h.
  • Per-stream wrapper cleanup before connection close -- per-stream aiopquic_stream_ctx_t* wrappers are freed at connection close rather than at stream RESET/FIN. Bounded leak per cnx; flagged for follow-up.

TODO

  • Windows support (eventfd alternative — IOCP / WSAEventSelect on the wake-fd path)
  • Free-threaded Python (3.14t) support after producer-side locking audit
  • STOP_SENDING error-code surfacing helper (read remote_stop_error from picoquic_internal.h)
  • Per-stream wrapper cleanup on RESET/FIN before connection close
  • WebTransport datagram TX path through the C bridge
  • Datagram benches: latency percentiles, payload-size sweep, loss / jitter under load (today's bench_datagram is fire-and-count throughput only)
  • Pure stream open/close microbench (lifecycle rate without payload, separate from bench_stream_churn_highlevel which bundles writes + FIN)
  • Submit aiopquic to the QUIC interop runner for cross-implementation coverage

Resources



A Marz Research project.
Author: G. S. Marzot <gmarzot@marzresearch.net>

License

MIT License -- see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiopquic-0.3.6.post1.dev0.tar.gz (723.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aiopquic-0.3.6.post1.dev0-cp314-cp314-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

aiopquic-0.3.6.post1.dev0-cp314-cp314-manylinux_2_34_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ ARM64

aiopquic-0.3.6.post1.dev0-cp314-cp314-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.14macOS 14.0+ ARM64

aiopquic-0.3.6.post1.dev0-cp313-cp313-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

aiopquic-0.3.6.post1.dev0-cp313-cp313-manylinux_2_34_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ ARM64

aiopquic-0.3.6.post1.dev0-cp313-cp313-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

aiopquic-0.3.6.post1.dev0-cp312-cp312-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

aiopquic-0.3.6.post1.dev0-cp312-cp312-manylinux_2_34_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ ARM64

aiopquic-0.3.6.post1.dev0-cp312-cp312-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

File details

Details for the file aiopquic-0.3.6.post1.dev0.tar.gz.

File metadata

  • Download URL: aiopquic-0.3.6.post1.dev0.tar.gz
  • Upload date:
  • Size: 723.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aiopquic-0.3.6.post1.dev0.tar.gz
Algorithm Hash digest
SHA256 49643633baec72c02edd37262908b62f6bb34689009c451b51a0b5e7e4fd5a03
MD5 42ebd3af95914c0ed6a72903d9105499
BLAKE2b-256 6c674dff92be52f48ea5087a69beb5512e3da942369ebcfdfa09b4f294c09ad7

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 804db1481a2440f0f421c6e9aeee09a6aebfcf72743c9bff8066fa47fae4be66
MD5 685ff9183057097409f41630c0fd3c55
BLAKE2b-256 c0169b93d0ff9317f92c8ec2ab9eb2e2b020085e9faa89694e02b3990fa8ceb5

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp314-cp314-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp314-cp314-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 3150e8f43255902a9c2cd05c67fced543ae141bc9fb9f209d56be63f3097a11f
MD5 9862c1daaf5169247977c00c36a20099
BLAKE2b-256 6c01a73c17d41c1138ef62a935444ff2d5eba62c72243affe8fa2eb2a75600b7

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp314-cp314-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp314-cp314-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 65e6b37151ac9de059c731f3165fe6d29f5be83081778edba9ecbd86ff530ff8
MD5 1bfe7ca4868c16d22c8a5beb0f0251bf
BLAKE2b-256 7a701825bff018592dd3ef119e5f63ebf5acad7ec00e0ee0c86ac707f8e5a91d

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 5db7b372158d52c061c4f521f57ab30c134a917c6d5949a7d63c8585d82f916f
MD5 c6bea5fffb31fb3c52ac2606e46b5888
BLAKE2b-256 50a9f55c12da349eed9a20389e02a65b7065935d21adf1466c2d51d254fc6dc3

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp313-cp313-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp313-cp313-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 20ae5a3c6de5a73249a0e7242d4cfcfee4242fab8e5f4b8f43ee21f1d777cb02
MD5 8bdc320360086cec4b184bf5d098c196
BLAKE2b-256 72a36dcf26e165415eb67b24fc10b1115b965e5fb5106aa96c2f124e5d7bcf35

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 797be4706cd35483e7d76865ed622b91e0d3e7fa46de5289ada407624b6fd29b
MD5 1f3cd6ed7cea9ee854207ab21109d900
BLAKE2b-256 594fbe78cebfc5992d838cf60428de948599b47d4dbf35c500a276b698176c11

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6dd66d924f414cd5f9f5c8a287d873930ae21d62f95ae46e186a1ef9ed03b920
MD5 b7858cb2a6d5370efd4e22db17a8de80
BLAKE2b-256 8229f80bb839c749f591a67298f82a7faa066c4bf2dc57290aeca5bfbe425684

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp312-cp312-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp312-cp312-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 165d1ba88f7e60b2cb2a1f5a1ff8a57e673494285c97280d7263ae425494e3bf
MD5 dc63aab1b833367debcb71c27f585461
BLAKE2b-256 6c154076879f1b5ac38ef128b38633e41fa2165eec620e7ed8bb72a75ee401fd

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6.post1.dev0-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6.post1.dev0-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 d44f52b79716456edc6a648dc444d1f1abdb578a609048592e2172d6fa473d34
MD5 7aa78c7da126c392635def0425f3f993
BLAKE2b-256 1361f0d2cc1143a4417d39e2ac0b696f7f0e5a49c0e4bfa7d7a1d0aa24113b0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page