Skip to main content

High-performance QUIC/HTTP3 library — picoquic-backed, qh3-compatible asyncio API

Project description

aiopquic - Async QUIC + WebTransport (picoquic)

aiopquic is a Python/Cython binding to picoquic, providing high-performance QUIC transport and WebTransport for asyncio applications.

Overview

aiopquic exposes picoquic's QUIC implementation through a lock-free SPSC ring buffer architecture that bridges the picoquic network thread with Python's asyncio event loop. It provides an asyncio QUIC/HTTP3 transport API in the spirit of aioquic (and its fork qh3) — similar shapes for QuicConfiguration, QuicConnection, connect / serve, and event types — plus a native WebTransport client/server layered on picoquic's H3 + h3zero. Not a drop-in replacement: semantics differ around backpressure (send_stream_data raises BufferError on full per-stream ring) and flow-control sizing.

Architecture

  • SPSC Ring Buffers -- Lock-free single producer/single consumer rings for event passing between threads, separate TX and RX rings per TransportContext.
  • TX path -- Asyncio pushes into per-stream byte ring; picoquic pulls at wire rate via prepare_to_send.
  • RX path -- picoquic pushes per-event StreamChunks; ownership transfers at pop for 1-copy delivery.
  • Cross-platform wake fd -- Linux eventfd for efficient asyncio add_reader() notification; pipe() self-pipe fallback on macOS / BSD.
  • Dedicated Network Thread -- picoquic runs in its own thread via picoquic_start_network_thread(). One worker thread per TransportContext; multiple contexts share the asyncio event loop within a single Python process.
  • Cython Bridge -- Thin Cython layer over C callbacks, minimal overhead.
  • WebTransport -- asyncio.webtransport.WebTransportSession (client + server) over picoquic's picowt_* API and h3zero.

Features

  • QUIC client and server: connect, serve, QuicConnectionProtocol
  • Stream data send/receive with FIN signaling, stream reset, stop_sending
  • WebTransport client + server: serve_webtransport, WebTransportSession
  • QUIC datagram TX + RX (note: WebTransport datagram TX not yet wired)
  • Connection migration / 0-RTT (inherited from picoquic)
  • Connection management: create, close, idle timeout, application close codes
  • Per-cnx multiplexing on the server side via QuicEngine
  • TLS keylog (NSS Key Log Format) for pcap decryption
  • Native picoquic_ct / picohttp_ct subprocess smoke (catches upstream regressions on every submodule update)

Test Results

Tests pass on Linux and macOS. The interop suite is opt-in (network-dependent).

Suite Coverage
test_spsc_ring per-event malloc ring lifecycle
test_buffer Cython Buffer
test_transport Transport lifecycle, wake fd, wake-up, connection management
test_loopback 17 tests — handshake, streams, FIN, reset, datagrams, ALPN mismatch, idle timeout, app-close codes, stop_sending, many-streams stress, TX-ring overflow
test_asyncio client/server stream + datagram exchange via connect / serve
test_baton_pattern Pure-QUIC baton-style stream multiplexing (UNI ↔ BIDI)
test_native_picoquic picoquic_ct / picohttp_ct subprocess driver
test_interop Real public endpoints (opt-in)
tests/bench/ microbenches: ring push/pop, single-shot/sustained/parallel/bidirectional throughput, datagrams, RTT latency, handshake rate, byte-verifying object stress + stream churn + concurrent streams (opt-in via pytest tests/bench)

Performance

Sustained single-stream throughput, 30s steady-state, byte-verifying, high-level asyncio API (QuicConnection.send_stream_data):

platform 1 KiB 4 KiB 16 KiB
AMD Ryzen 7 PRO 7840U / WSL2 / Linux 6.6 1,570 Mbps 2,118 Mbps 2,031 Mbps
Apple M-series / macOS Sonoma 953 Mbps 1,130 Mbps 1,104 Mbps

These are over local UDP loopback at the QUIC default MTU (~1,400 B). The realistic ceiling at that MTU is the kernel's per-syscall sendmsg rate, not bandwidth. On Ryzen WSL2, raw iperf3 -u -l 1400 over loopback maxes at 3.15 Gbps (≈ 280 K syscalls/s); raise the datagram size and it climbs cleanly — 4 KiB → 7.9, 8 KiB → 12.8, 32 KiB → 33.7 Gbps. So QUIC pinned at MTU is in a regime where the syscall rate is the wall.

In that regime, here's where the layers land on Ryzen WSL2:

layer ss_mbps of UDP@1400 ceiling
iperf3 -u -l 1400 (raw UDP loopback) 3,150 100 %
picoquicdemo -a perf (picoquic over UDP) 2,184 69 %
aiopquic lowlevel (SPSC ring + UDP) 2,322 74 %
aiopquic highlevel (asyncio + SPSC + UDP) 2,031 64 %
sim_link_bench (picoquic only, no kernel UDP) 11,216 (off-axis)

The asyncio wrapper costs ~10 % below the lowlevel SPSC path; picoquic's own QUIC framing/encryption/ACK overhead accounts for ~25 % vs raw UDP. Both are normal for QUIC-over-loopback at MTU.

sim_link_bench (tests/bench/sim_link/) drives picoquic over its picoquictest_sim_link simulated link — packets are routed in-process between two picoquic_quic_t instances, no kernel UDP, no sockets, no syscall-rate ceiling. It isolates picoquic protocol CPU cost from the loopback wall and is platform-independent. The 11.2 Gbps number above is what picoquic can do without any kernel involvement on this hardware. Build with ./tests/bench/sim_link/build.sh after ./build_picoquic.sh.

Calibrate on your own hardware:

# UDP-over-loopback path (what aiopquic users actually see)
pytest tests/bench/bench_baselines_highlevel.py -s -v          # 30s default
pytest tests/bench/bench_baselines_highlevel.py -s -v --duration=60

# Protocol-only reference (no kernel UDP)
PICOQUIC_SOLUTION_DIR=third_party/picoquic/ \
    tests/bench/sim_link/sim_link_bench --duration-s 30 --rate-gbps 100

Microbenches (ring lifecycle, stream churn, concurrent-streams short bursts) live under tests/bench/ for development reference. Their reported numbers are not representative of sustained throughput — short windows inflate numbers from warmup transients (a 100-stream churn case at 256 B per stream measures ~1 ms of work, dominated by setup cost).

Installation

Wheels for cp312 / cp313 / cp314 on Linux (manylinux_2_34, glibc 2.34+) and macOS arm64 are published to PyPI:

uv pip install aiopquic     # or: pip install aiopquic

For older Linux (glibc 2.28–2.33) install via sdist; build toolchain required.

From source

git clone https://github.com/gmarzot/aiopquic.git
cd aiopquic
git submodule update --init --recursive
./bootstrap_python.sh    # creates .venv with uv-managed Python 3.14 (GIL build) and pins cython 3.2+
source .venv/bin/activate
./build_picoquic.sh      # builds picotls, picoquic, native test drivers
uv pip install -e '.[dev]'    # or: pip install -e '.[dev]'

On macOS, set OPENSSL_ROOT_DIR if Homebrew OpenSSL is not auto-detected (the build script tries openssl@3 then openssl@1.1).

Reporting issues

Include the full version report in any issue — it captures aiopquic plus the picoquic + picotls submodule SHAs the binding was built from:

python -m aiopquic.versions   # or the console script: aiopquic-versions

Sample output:

aiopquic 0.3.5.dev4+g2ffe8947d.d20260522
         /path/to/aiopquic
picoquic 2b1e14d5a46532eadf691edef5bd747da6de6557
picotls  f350eab60742138ac62b42ee444adf04c7898b0d

If you're running aiomoqt on top, prefer python -m aiomoqt.versions — it chains through to this report and includes the aiomoqt version too.

Usage

Low-level Transport API

from aiopquic._binding._transport import TransportContext

server = TransportContext()
server.start(port=4433, cert_file="cert.pem", key_file="key.pem", alpn="moq-00", is_client=False)

client = TransportContext()
client.start(port=0, alpn="moq-00", is_client=True)
client.create_client_connection("127.0.0.1", 4433, sni="localhost", alpn="moq-00")

Asyncio API

from aiopquic.asyncio.client import connect
from aiopquic.quic.configuration import QuicConfiguration

configuration = QuicConfiguration(alpn_protocols=["myproto"], is_client=True)

async with connect("server", 4433, configuration=configuration) as protocol:
    quic = protocol._quic
    stream_id = quic.get_next_available_stream_id()
    quic.send_stream_data(stream_id, payload, end_stream=True)
    protocol.transmit()

payload is opaque bytes; the library doesn't impose framing. Consumers that want HTTP/3 layer on top of aiopquic's picowt-backed h3zero plumbing; consumers that want WebTransport use serve_webtransport / connect_webtransport. Most direct users of the asyncio API ship their own protocol bytes (MoQT, custom binary frames, etc.).

WebTransport

from aiopquic.asyncio.webtransport import (
    serve_webtransport, WebTransportSession,
)
# See src/aiopquic/asyncio/webtransport.py and tests/ for full examples.

Development

uv pip install -e '.[dev]'    # or: pip install -e '.[dev]'
python -m pytest tests/ -v -m "not interop and not native"

# Microbenches (opt-in)
python -m pytest tests/bench

Performance build (opt-in)

Default builds use CMAKE_BUILD_TYPE=Release (-O3 -DNDEBUG), portable across hosts. Two opt-in env vars layer on host-tuned optimizations for local benching — neither is enabled in PyPI wheels:

# Host-tuned: Fusion AES-GCM (x86_64), DISABLE_DEBUG_PRINTF,
# -O3 -march=native -flto. Binary becomes machine-specific.
AIOPQUIC_PERF=1 ./build_picoquic.sh

Per-platform behavior:

Knob Linux x86_64 Linux ARM64 macOS arm64 macOS x86_64
-O3 -DNDEBUG (always on)
DISABLE_DEBUG_PRINTF
Fusion AES-GCM (CPUID-dispatched)
-march=native / -mcpu=native + -flto

Experimental: AIOPQUIC_IO_URING=1 (DORMANT)

io_uring scaffolding is in the tree (third_party/liburing submodule, picoquic patch, setup.py linkage). Enabling it builds picoquic_packet_loop_uring into libpicoquic-core.a and statically links liburing.a into the Cython extension:

AIOPQUIC_IO_URING=1 ./build_picoquic.sh   # auto-fetches + builds liburing-2.7
uv pip install -e '.[dev]'                 # re-cythonize with PICOQUIC_WITH_IO_URING define

This currently has no runtime effect. aiopquic's worker thread uses its own callback/SPSC-ring path and does not invoke picoquic_packet_loop_uring. The scaffolding is preserved so the worker can be migrated to io_uring later without re-discovering the build recipe (liburing submodule pin, picoquic header patch for kernel-uapi conflicts, ABI-critical define propagation through setup.py).

Linux-only. Compatible CPU architectures: x86_64, ARM64. Build will hard-error if AIOPQUIC_IO_URING=1 is set on macOS / BSD / Windows.

ABI note: picoquic_network_thread_ctx_t and picoquic_socket_ctx_t have conditional fields gated on PICOQUIC_WITH_IO_URING. The build-script + setup.py propagate the define to both picoquic-core and the Cython extension. A mismatch silently shifts thread_is_ready and other field offsets — the network thread appears to never become ready. Don't enable WITH_IO_URING in picoquic without also defining PICOQUIC_WITH_IO_URING in the Cython build.

Runtime deployment guidance

These are runtime tunings, separate from build-time flags above. PyPI wheels ship with portable perf flags baked in (see Performance build); these knobs apply on top of any binary.

jemalloc for tail-latency reduction

The default glibc allocator's per-thread arenas + occasional coalescing show up as max-latency outliers under sustained high-throughput workloads. Preloading jemalloc measurably tightens the tail:

# Debian/Ubuntu:
sudo apt install libjemalloc2
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 python -m your_app

# Fedora/RHEL:
sudo dnf install jemalloc
LD_PRELOAD=/usr/lib64/libjemalloc.so.2 python -m your_app

Validated improvement on a representative aiopquic sustained workload (Ryzen 7 PRO, Linux loopback): sd 7.1 ms → 4.3 ms, max 437 ms → 310 ms, throughput unchanged. Effect is most visible at multi-Gbps over 60+ second runs; small workloads see no difference.

GSO and send-length-max

GSO (UDP segmentation offload) is already enabled by default on Linux with send_length_max=65535 (max kernel-coalesced stride). No user action needed. macOS / FreeBSD default to GSO off — picoquic's per-datagram sendmsg path is used instead. Env overrides:

AIOPQUIC_GSO=0                  # force off (diagnostic only)
AIOPQUIC_SEND_LENGTH_MAX=8192   # cap kernel-coalesced buffer (Linux GSO on)

TX wake threshold

The TX SPSC event ring's drain-wake threshold defaults to 50% — producer is signalled to resume only after ≥ half the queued events have drained. Overridable to tune for latency vs. context-switch overhead:

AIOPQUIC_TX_RING_WAKE_PCT=25    # wake earlier (lower per-send latency, more context switches)
AIOPQUIC_TX_RING_WAKE_PCT=75    # wake later (more batching, slightly higher latency)

Known Limitations

  • Free-threaded Python (3.14t) not yet supported -- the TX-ring producer side, TransportContext lifecycle, and the WebTransport engine state currently rely on the GIL for serialization. FT support deferred until a per-context locking audit lands.
  • STOP_SENDING error codes surface as 0 today: picoquic's public stream-error getter only returns the RESET_STREAM code. STOP_SENDING's code lives in stream->remote_stop_error in picoquic_internal.h (no public getter). A small helper that pulls the field is straightforward future work — see TODO in src/aiopquic/_binding/c/callback.h.
  • Per-stream wrapper cleanup before connection close -- per-stream aiopquic_stream_ctx_t* wrappers are freed at connection close rather than at stream RESET/FIN. Bounded leak per cnx; flagged for follow-up.

TODO

  • Windows support (eventfd alternative — IOCP / WSAEventSelect on the wake-fd path)
  • Free-threaded Python (3.14t) support after producer-side locking audit
  • STOP_SENDING error-code surfacing helper (read remote_stop_error from picoquic_internal.h)
  • Per-stream wrapper cleanup on RESET/FIN before connection close
  • WebTransport datagram TX path through the C bridge
  • Datagram benches: latency percentiles, payload-size sweep, loss / jitter under load (today's bench_datagram is fire-and-count throughput only)
  • Pure stream open/close microbench (lifecycle rate without payload, separate from bench_stream_churn_highlevel which bundles writes + FIN)
  • Submit aiopquic to the QUIC interop runner for cross-implementation coverage

Resources



A Marz Research project.
Author: G. S. Marzot <gmarzot@marzresearch.net>

License

MIT License -- see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiopquic-0.3.5.tar.gz (699.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aiopquic-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

aiopquic-0.3.5-cp314-cp314-manylinux_2_34_aarch64.whl (4.8 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ ARM64

aiopquic-0.3.5-cp314-cp314-macosx_14_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.14macOS 14.0+ ARM64

aiopquic-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

aiopquic-0.3.5-cp313-cp313-manylinux_2_34_aarch64.whl (4.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ ARM64

aiopquic-0.3.5-cp313-cp313-macosx_14_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

aiopquic-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl (5.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

aiopquic-0.3.5-cp312-cp312-manylinux_2_34_aarch64.whl (4.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ ARM64

aiopquic-0.3.5-cp312-cp312-macosx_14_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

File details

Details for the file aiopquic-0.3.5.tar.gz.

File metadata

  • Download URL: aiopquic-0.3.5.tar.gz
  • Upload date:
  • Size: 699.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aiopquic-0.3.5.tar.gz
Algorithm Hash digest
SHA256 0c88d2d43e3339d852824a2811395d03d8b7d8c87526c2b14f8b29582b925dfb
MD5 8054adbbf3183cf3ba2a076dbd06b1ca
BLAKE2b-256 2aeb49b39f958861c27879cb74b3ebdb2c3be502dcba357511c599245f6c001b

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9290c4619c76f4fceda68fc3f35dbad5966c40f36674fa6f33b7dc5a1c96cd15
MD5 e0497f7bad6d91ad86d28b4f65bf6a02
BLAKE2b-256 6d2a4fab97ac877cd01a6967978a6f0d5a89405f5f81dfcfd7be20a85621b50b

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp314-cp314-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp314-cp314-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 16cf10efeb65c95158b73f42a52642119dc7aa5d302c301d1c63991eb6fe182c
MD5 e1caa72c8fa8788a7a6c96677f87e699
BLAKE2b-256 8343e0a378f1a165ecb5439fcd7ce4e1244d80752c532dcc3a204ea33e4219f7

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp314-cp314-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp314-cp314-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 d3e50c9ddf287ca292c3e7817d4b5844718beb999f1f9101a0f8fcdfdb5114ef
MD5 ce523ece6165ffeb9c4e9a55000c6690
BLAKE2b-256 98fd0b85873e7d123b800d77ec8cb200c80947ea70be6c04fb76c8326f7c5339

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 3f0d4a991a2a677a677f9c8847f03f89f565a1581684dbb24c6c5419bb7ae4fd
MD5 b4a2d14d53c33f953342c68ae5bb4a41
BLAKE2b-256 a92e60f9495ac39e334e9fcd03b7164df15f34651f038a3100463aeae84eca28

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp313-cp313-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp313-cp313-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 c03e0cbb5a7daed60e37b83b3e1387beeec0cb6fd3fdfd9b6577dead77dc5be0
MD5 90ae1ae271ddaa811cccb8d00ac26c78
BLAKE2b-256 ff3318229f540ca785a4ab0e9c8fbc6353c87956993ba7b3677294981258ffc5

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 8f76969cbeca29c3e4c034dad35a49b8104b8610e270f51dda6e7e1a25b9f94c
MD5 05ebbaae28c65ba97eff8084ebc3f9f2
BLAKE2b-256 359b005f2374d04d6df9836069fe98f3f84ff700c3e070d6acbb0e1356949eaa

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a62e31e33ea33f92739fd29a88b29a01da7bb6884c9f1e08774607f7b17bf9b9
MD5 880a973fea98cd7022d5b6ada1ab327f
BLAKE2b-256 cad47b7d097880a9ed8580eddfe62ac921dbfcabeebd6f31185715a08e9c8708

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp312-cp312-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp312-cp312-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 ab624c160ba7a42067ed0b8f354a668459340a938a9c27fa0df830d95ebb30b4
MD5 5b21586e5aa0d60b2f1df05a8d975f49
BLAKE2b-256 5f1a82e551e65ada2600f08bee4303d3c6d697cd009b8888e29aae0bad7f462e

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.5-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.5-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 26fc2909cbd2e28ffc81d93d344b76b82e023d33c531c89cff31686fd7175bac
MD5 ece7a446bc3d548488980df42d09796a
BLAKE2b-256 6031bc8b412662436ac476cc1009888772e0ea22d5f811d690f06f151e78a8cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page