Skip to main content

High-performance QUIC/HTTP3 library — picoquic-backed, qh3-compatible asyncio API

Project description

aiopquic - Async QUIC + WebTransport (picoquic)

aiopquic is a Python/Cython binding to picoquic, providing high-performance QUIC transport and WebTransport for asyncio applications.

Overview

aiopquic exposes picoquic's QUIC implementation through a lock-free SPSC ring buffer architecture that bridges the picoquic network thread with Python's asyncio event loop. It provides an asyncio QUIC/HTTP3 transport API in the spirit of aioquic (and its fork qh3) — similar shapes for QuicConfiguration, QuicConnection, connect / serve, and event types — plus a native WebTransport client/server layered on picoquic's H3 + h3zero. Not a drop-in replacement: semantics differ around backpressure (send_stream_data raises BufferError on full per-stream ring) and flow-control sizing.

Architecture

  • SPSC Ring Buffers -- Lock-free single producer/single consumer rings for event passing between threads, separate TX and RX rings per TransportContext.
  • TX path -- Asyncio pushes into per-stream byte ring; picoquic pulls at wire rate via prepare_to_send.
  • RX path -- picoquic pushes per-event StreamChunks; ownership transfers at pop for 1-copy delivery.
  • Cross-platform wake fd -- Linux eventfd for efficient asyncio add_reader() notification; pipe() self-pipe fallback on macOS / BSD.
  • Dedicated Network Thread -- picoquic runs in its own thread via picoquic_start_network_thread(). One worker thread per TransportContext; multiple contexts share the asyncio event loop within a single Python process.
  • Cython Bridge -- Thin Cython layer over C callbacks, minimal overhead.
  • WebTransport -- asyncio.webtransport.WebTransportSession (client + server) over picoquic's picowt_* API and h3zero.

Features

  • QUIC client and server: connect, serve, QuicConnectionProtocol
  • Stream data send/receive with FIN signaling, stream reset, stop_sending
  • WebTransport client + server: serve_webtransport, WebTransportSession
  • QUIC datagram TX + RX (note: WebTransport datagram TX not yet wired)
  • Connection migration / 0-RTT (inherited from picoquic)
  • Connection management: create, close, idle timeout, application close codes
  • Per-cnx multiplexing on the server side via QuicEngine
  • TLS keylog (NSS Key Log Format) for pcap decryption
  • Native picoquic_ct / picohttp_ct subprocess smoke (catches upstream regressions on every submodule update)

Test Results

Tests pass on Linux and macOS. The interop suite is opt-in (network-dependent).

Suite Coverage
test_spsc_ring per-event malloc ring lifecycle
test_buffer Cython Buffer
test_transport Transport lifecycle, wake fd, wake-up, connection management
test_loopback 17 tests — handshake, streams, FIN, reset, datagrams, ALPN mismatch, idle timeout, app-close codes, stop_sending, many-streams stress, TX-ring overflow
test_asyncio client/server stream + datagram exchange via connect / serve
test_baton_pattern Pure-QUIC baton-style stream multiplexing (UNI ↔ BIDI)
test_native_picoquic picoquic_ct / picohttp_ct subprocess driver
test_interop Real public endpoints (opt-in)
tests/bench/ microbenches: ring push/pop, single-shot/sustained/parallel/bidirectional throughput, datagrams, RTT latency, handshake rate, byte-verifying object stress + stream churn + concurrent streams (opt-in via pytest tests/bench)

Performance

Sustained single-stream throughput, 30s steady-state, byte-verifying, high-level asyncio API (QuicConnection.send_stream_data):

platform 1 KiB 4 KiB 16 KiB
AMD Ryzen 7 PRO 7840U / WSL2 / Linux 6.6 1,570 Mbps 2,118 Mbps 2,031 Mbps
Apple M-series / macOS Sonoma 953 Mbps 1,130 Mbps 1,104 Mbps

These are over local UDP loopback at the QUIC default MTU (~1,400 B). The realistic ceiling at that MTU is the kernel's per-syscall sendmsg rate, not bandwidth. On Ryzen WSL2, raw iperf3 -u -l 1400 over loopback maxes at 3.15 Gbps (≈ 280 K syscalls/s); raise the datagram size and it climbs cleanly — 4 KiB → 7.9, 8 KiB → 12.8, 32 KiB → 33.7 Gbps. So QUIC pinned at MTU is in a regime where the syscall rate is the wall.

In that regime, here's where the layers land on Ryzen WSL2:

layer ss_mbps of UDP@1400 ceiling
iperf3 -u -l 1400 (raw UDP loopback) 3,150 100 %
picoquicdemo -a perf (picoquic over UDP) 2,184 69 %
aiopquic lowlevel (SPSC ring + UDP) 2,322 74 %
aiopquic highlevel (asyncio + SPSC + UDP) 2,031 64 %
sim_link_bench (picoquic only, no kernel UDP) 11,216 (off-axis)

The asyncio wrapper costs ~10 % below the lowlevel SPSC path; picoquic's own QUIC framing/encryption/ACK overhead accounts for ~25 % vs raw UDP. Both are normal for QUIC-over-loopback at MTU.

sim_link_bench (tests/bench/sim_link/) drives picoquic over its picoquictest_sim_link simulated link — packets are routed in-process between two picoquic_quic_t instances, no kernel UDP, no sockets, no syscall-rate ceiling. It isolates picoquic protocol CPU cost from the loopback wall and is platform-independent. The 11.2 Gbps number above is what picoquic can do without any kernel involvement on this hardware. Build with ./tests/bench/sim_link/build.sh after ./build_picoquic.sh.

Calibrate on your own hardware:

# UDP-over-loopback path (what aiopquic users actually see)
pytest tests/bench/bench_baselines_highlevel.py -s -v          # 30s default
pytest tests/bench/bench_baselines_highlevel.py -s -v --duration=60

# Protocol-only reference (no kernel UDP)
PICOQUIC_SOLUTION_DIR=third_party/picoquic/ \
    tests/bench/sim_link/sim_link_bench --duration-s 30 --rate-gbps 100

Microbenches (ring lifecycle, stream churn, concurrent-streams short bursts) live under tests/bench/ for development reference. Their reported numbers are not representative of sustained throughput — short windows inflate numbers from warmup transients (a 100-stream churn case at 256 B per stream measures ~1 ms of work, dominated by setup cost).

Installation

Wheels for cp312 / cp313 / cp314 on Linux (manylinux_2_34, glibc 2.34+) and macOS arm64 are published to PyPI:

uv pip install aiopquic     # or: pip install aiopquic

For older Linux (glibc 2.28–2.33) install via sdist; build toolchain required.

From source

git clone https://github.com/gmarzot/aiopquic.git
cd aiopquic
git submodule update --init --recursive
./bootstrap_python.sh    # creates .venv with uv-managed Python 3.14 (GIL build) and pins cython 3.2+
source .venv/bin/activate
./build_picoquic.sh      # builds picotls, picoquic, native test drivers
uv pip install -e '.[dev]'    # or: pip install -e '.[dev]'

On macOS, set OPENSSL_ROOT_DIR if Homebrew OpenSSL is not auto-detected (the build script tries openssl@3 then openssl@1.1).

Reporting issues

Include the full version report in any issue — it captures aiopquic plus the picoquic + picotls submodule SHAs the binding was built from:

python -m aiopquic.versions   # or the console script: aiopquic-versions

Sample output:

aiopquic 0.3.5.dev4+g2ffe8947d.d20260522
         /path/to/aiopquic
picoquic 2b1e14d5a46532eadf691edef5bd747da6de6557
picotls  f350eab60742138ac62b42ee444adf04c7898b0d

If you're running aiomoqt on top, prefer python -m aiomoqt.versions — it chains through to this report and includes the aiomoqt version too.

Usage

Low-level Transport API

from aiopquic._binding._transport import TransportContext

server = TransportContext()
server.start(port=4433, cert_file="cert.pem", key_file="key.pem", alpn="moq-00", is_client=False)

client = TransportContext()
client.start(port=0, alpn="moq-00", is_client=True)
client.create_client_connection("127.0.0.1", 4433, sni="localhost", alpn="moq-00")

Asyncio API

from aiopquic.asyncio.client import connect
from aiopquic.quic.configuration import QuicConfiguration

configuration = QuicConfiguration(alpn_protocols=["myproto"], is_client=True)

async with connect("server", 4433, configuration=configuration) as protocol:
    quic = protocol._quic
    stream_id = quic.get_next_available_stream_id()
    quic.send_stream_data(stream_id, payload, end_stream=True)
    protocol.transmit()

payload is opaque bytes; the library doesn't impose framing. Consumers that want HTTP/3 layer on top of aiopquic's picowt-backed h3zero plumbing; consumers that want WebTransport use serve_webtransport / connect_webtransport. Most direct users of the asyncio API ship their own protocol bytes (MoQT, custom binary frames, etc.).

WebTransport

from aiopquic.asyncio.webtransport import (
    serve_webtransport, WebTransportSession,
)
# See src/aiopquic/asyncio/webtransport.py and tests/ for full examples.

Development

uv pip install -e '.[dev]'    # or: pip install -e '.[dev]'
python -m pytest tests/ -v -m "not interop and not native"

# Microbenches (opt-in)
python -m pytest tests/bench

Performance build (opt-in)

Default builds use CMAKE_BUILD_TYPE=Release (-O3 -DNDEBUG), portable across hosts. Two opt-in env vars layer on host-tuned optimizations for local benching — neither is enabled in PyPI wheels:

# Host-tuned: Fusion AES-GCM (x86_64), DISABLE_DEBUG_PRINTF,
# -O3 -march=native -flto. Binary becomes machine-specific.
AIOPQUIC_PERF=1 ./build_picoquic.sh

Per-platform behavior:

Knob Linux x86_64 Linux ARM64 macOS arm64 macOS x86_64
-O3 -DNDEBUG (always on)
DISABLE_DEBUG_PRINTF
Fusion AES-GCM (CPUID-dispatched)
-march=native / -mcpu=native + -flto

Experimental: AIOPQUIC_IO_URING=1 (DORMANT)

io_uring scaffolding is in the tree (third_party/liburing submodule, picoquic patch, setup.py linkage). Enabling it builds picoquic_packet_loop_uring into libpicoquic-core.a and statically links liburing.a into the Cython extension:

AIOPQUIC_IO_URING=1 ./build_picoquic.sh   # auto-fetches + builds liburing-2.7
uv pip install -e '.[dev]'                 # re-cythonize with PICOQUIC_WITH_IO_URING define

This currently has no runtime effect. aiopquic's worker thread uses its own callback/SPSC-ring path and does not invoke picoquic_packet_loop_uring. The scaffolding is preserved so the worker can be migrated to io_uring later without re-discovering the build recipe (liburing submodule pin, picoquic header patch for kernel-uapi conflicts, ABI-critical define propagation through setup.py).

Linux-only. Compatible CPU architectures: x86_64, ARM64. Build will hard-error if AIOPQUIC_IO_URING=1 is set on macOS / BSD / Windows.

ABI note: picoquic_network_thread_ctx_t and picoquic_socket_ctx_t have conditional fields gated on PICOQUIC_WITH_IO_URING. The build-script + setup.py propagate the define to both picoquic-core and the Cython extension. A mismatch silently shifts thread_is_ready and other field offsets — the network thread appears to never become ready. Don't enable WITH_IO_URING in picoquic without also defining PICOQUIC_WITH_IO_URING in the Cython build.

Runtime deployment guidance

These are runtime tunings, separate from build-time flags above. PyPI wheels ship with portable perf flags baked in (see Performance build); these knobs apply on top of any binary.

jemalloc for tail-latency reduction

The default glibc allocator's per-thread arenas + occasional coalescing show up as max-latency outliers under sustained high-throughput workloads. Preloading jemalloc measurably tightens the tail:

# Debian/Ubuntu:
sudo apt install libjemalloc2
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 python -m your_app

# Fedora/RHEL:
sudo dnf install jemalloc
LD_PRELOAD=/usr/lib64/libjemalloc.so.2 python -m your_app

Validated improvement on a representative aiopquic sustained workload (Ryzen 7 PRO, Linux loopback): sd 7.1 ms → 4.3 ms, max 437 ms → 310 ms, throughput unchanged. Effect is most visible at multi-Gbps over 60+ second runs; small workloads see no difference.

GSO and send-length-max

GSO (UDP segmentation offload) is already enabled by default on Linux with send_length_max=65535 (max kernel-coalesced stride). No user action needed. macOS / FreeBSD default to GSO off — picoquic's per-datagram sendmsg path is used instead. Env overrides:

AIOPQUIC_GSO=0                  # force off (diagnostic only)
AIOPQUIC_SEND_LENGTH_MAX=8192   # cap kernel-coalesced buffer (Linux GSO on)

TX wake threshold

The TX SPSC event ring's drain-wake threshold defaults to 50% — producer is signalled to resume only after ≥ half the queued events have drained. Overridable to tune for latency vs. context-switch overhead:

AIOPQUIC_TX_RING_WAKE_PCT=25    # wake earlier (lower per-send latency, more context switches)
AIOPQUIC_TX_RING_WAKE_PCT=75    # wake later (more batching, slightly higher latency)

Known Limitations

  • Free-threaded Python (3.14t) not yet supported -- the TX-ring producer side, TransportContext lifecycle, and the WebTransport engine state currently rely on the GIL for serialization. FT support deferred until a per-context locking audit lands.
  • STOP_SENDING error codes surface as 0 today: picoquic's public stream-error getter only returns the RESET_STREAM code. STOP_SENDING's code lives in stream->remote_stop_error in picoquic_internal.h (no public getter). A small helper that pulls the field is straightforward future work — see TODO in src/aiopquic/_binding/c/callback.h.
  • Per-stream wrapper cleanup before connection close -- per-stream aiopquic_stream_ctx_t* wrappers are freed at connection close rather than at stream RESET/FIN. Bounded leak per cnx; flagged for follow-up.

TODO

  • Windows support (eventfd alternative — IOCP / WSAEventSelect on the wake-fd path)
  • Free-threaded Python (3.14t) support after producer-side locking audit
  • STOP_SENDING error-code surfacing helper (read remote_stop_error from picoquic_internal.h)
  • Per-stream wrapper cleanup on RESET/FIN before connection close
  • WebTransport datagram TX path through the C bridge
  • Datagram benches: latency percentiles, payload-size sweep, loss / jitter under load (today's bench_datagram is fire-and-count throughput only)
  • Pure stream open/close microbench (lifecycle rate without payload, separate from bench_stream_churn_highlevel which bundles writes + FIN)
  • Submit aiopquic to the QUIC interop runner for cross-implementation coverage

Resources



A Marz Research project.
Author: G. S. Marzot <gmarzot@marzresearch.net>

License

MIT License -- see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiopquic-0.3.6.tar.gz (723.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aiopquic-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

aiopquic-0.3.6-cp314-cp314-manylinux_2_34_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ ARM64

aiopquic-0.3.6-cp314-cp314-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.14macOS 14.0+ ARM64

aiopquic-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

aiopquic-0.3.6-cp313-cp313-manylinux_2_34_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ ARM64

aiopquic-0.3.6-cp313-cp313-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

aiopquic-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

aiopquic-0.3.6-cp312-cp312-manylinux_2_34_aarch64.whl (4.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ ARM64

aiopquic-0.3.6-cp312-cp312-macosx_14_0_arm64.whl (3.9 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

File details

Details for the file aiopquic-0.3.6.tar.gz.

File metadata

  • Download URL: aiopquic-0.3.6.tar.gz
  • Upload date:
  • Size: 723.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aiopquic-0.3.6.tar.gz
Algorithm Hash digest
SHA256 54df8b7e74126198bebfa825b985a320e74d6af6209dd3fdaa264a7b55ba0c52
MD5 569fc8a5ff7d1d748302af9b650b7817
BLAKE2b-256 67507b1f50d14f04ee189e7c9d5277a4201264003067ea088d45b98691b78968

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0a642d871dea607b2f644fbc2ce061d626fb76e24443182b2be57e041ea3093d
MD5 21f24e9965a5b7a275173d5834a59422
BLAKE2b-256 023c46b684945ea8e09f78f17cde42bf5607f4d9717ad99ab6dfb60c3ce5949a

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp314-cp314-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp314-cp314-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 e68c29b57e7ca1903f76524722a4c232846e29100b01d4714afbee9b362f0c15
MD5 cad80c45688ea56251720437a75c43ba
BLAKE2b-256 5a801db4a127d7295ef06c64034b7c51231429d9bcd8fd008a43c231809e03a1

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp314-cp314-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp314-cp314-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 b3748b3771e6eacef3e2091d898f1d72b72d7a9f94f8d6a00214c23910f4439c
MD5 5418bf3851359d260407ed9b6b30715b
BLAKE2b-256 0c7c1af8a24ff8668d728eb42f210ffa2524c31c553203d000e1ea4bae2c70cc

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a13cc073e8d286f529d1011850d83bde90598d6ad8502c223d3d41bca37502e7
MD5 de90396a783986410ed0e410f756bfcf
BLAKE2b-256 63fb7e2fbe7f4086a359139634d4fb7eefc19997a659af44d6ccc77095cbf1cb

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp313-cp313-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp313-cp313-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 fac4b9c4d52faae4d130193649304ebf94beb1e60e40e8723ea6e27f02eaa163
MD5 e483796255acce63002ed3f63162573c
BLAKE2b-256 51b60013a39c106d24ff13fe294959b2490b7c0cff170ee31d35bb699de36eb6

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 677322f0d5be686cb68b15a1f73c511597b43fcaa8f7c5d00c24703e5f54a61b
MD5 aa1c42d3b1870ad860620a750f023ee7
BLAKE2b-256 8334aa07e33ad6cf9a30982b0fad05086bb536f5e432cf2193d13999c133d44d

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bfb6f26b12407087d8af279bb62f604bbe5a91a1fc9fb6074ab9657fab59735a
MD5 dfae1953cf43258b827ab9c54e8cc257
BLAKE2b-256 172df41faf90c7e2b19e8d3ce4ce3b4e76cf1b50ef8f9e6289db1c63abfd6eaa

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp312-cp312-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp312-cp312-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 6d085a35abcec848dc2e0609ea43a6c0ea163fca2ee87e9cbfe6603cbc68f362
MD5 4f157a61b0de8556378dd82a15e3cb8d
BLAKE2b-256 8a7483f688444e65de9ca32661f4866f9a734cf6f8e393f834f935dc39ed6ef1

See more details on using hashes here.

File details

Details for the file aiopquic-0.3.6-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for aiopquic-0.3.6-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 65ce81fe5cdcb4da92cbe07a6f7a5b59393ca003d4c08f7d5dbf73800afd7b3f
MD5 a1a99ac8b62074e9d4d06d75736fbcda
BLAKE2b-256 e55aab323d64e0a83ce3258e7e8f9f71e3922fe0b904ab596b20c9a25fd876ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page