Low-latency ML feature serving via shared memory.

These details have not been verified by PyPI

Project links

Project description

Quorin

Low-latency ML feature serving for one machine. ~5 µs p99 reads from shared memory.

v0.1.0 — feature-complete; 758 tests passing; 5 µs p99 substantiated on GitHub Actions ubuntu-latest at N=20 fresh subprocesses (median_p99 = 4.48 µs for the 4-field warm assemble path; see Benchmarks).

What is this

Machine-learning serving has a structural latency floor that the model itself doesn't cause. A typical online-prediction request looks like:

fetch features from Redis  →  decode bytes  →  build Python dict  →  call model  →  return

Steps 1–3 cost 5–50 ms on a healthy box. The model's predict() call is often ~200 µs. The infrastructure around the model is the bottleneck — not the math. At 50,000 RPS that's a 250-core overhead just to shuttle bytes.

Quorin replaces the slow path with a shared-memory + precomputed-offset-table read: features live as typed bytes in a POSIX shm segment that every worker process already has mapped. A read becomes "compute the offset, copy the bytes, return a numpy.float32 array" — ~4 µs p99 on commodity hardware, zero Python object allocations, zero Redis calls on the hot path.

It is deliberately single-node. No distribution, no replication, no cross-node coordination. Beyond ~1M entities the answer is horizontal sharding by hash(entity_id) mod N across multiple Quorin instances. See FAQ for the explicit scope discipline.

Schema preview

This is what defining a feature schema looks like — pure Python, no infrastructure:

from quorin.schema import FeatureSchema, FeatureField, dtype

class UserFeatures(FeatureSchema):
    version = 1
    fields = [
        FeatureField("age_normalized", dtype.float32),
        FeatureField("session_count_7d", dtype.int32),
        FeatureField("ltv_score", dtype.float32),
        FeatureField("behavior_embedding", dtype.float32, shape=(128,)),
    ]

A FeatureSchema subclass compiles, once at process start, into a NumPy-backed offset table. Lookups are searchsorted on a sorted hash array; reads are a Numba-compiled memory copy.

Benchmarks

Numbers measured on GitHub Actions ubuntu-latest (ubuntu-24.04) via the N=20 fresh-subprocess orchestrator (benchmarks/runs/repeat.py), workflow run 25394553451, commit 4818ea4.

Scenario	median p50	median p99	Spec band	Source JSON
4-field warm assemble (the headline)	4.14 µs	4.48 µs ✅	≤ 5 µs	`headline_4_field_warm_n20.json`
200-field warm assemble	7.59 µs	11.66 µs ✅	10–20 µs	`headline_200_field_warm_n20.json`
200-field cold assemble	31.28 µs	66.14 µs †	20–50 µs	`headline_200_field_cold_n20.json`
4-field assemble under GC pressure p999	—	22.44 µs (p999)	(informational)	`gc_p999_pressure_n20.json`
`write_sync` end-to-end RTT	1.93 ms	2.18 ms	≤ 75 ms gate	`write_sync_rtt_n20.json`

† Cold-cache 66 µs p99 is over the 20–50 µs spec band by ~30% on ubuntu-latest's older Xeon CPUs (~30 MB L3 per socket). Per ADR-015 §11 bare-metal extrapolation (modern desktop CPUs are 1.5–3× faster than ubuntu-latest on this bandwidth-bound bench), the projected bare-metal range is ~22–44 µs, inside the spec band. Re-measure on your own hardware if the cold-cache number matters — single-process cold-cache p99 is heavy-tailed (3-4× run-to-run variance per ADR-015 §4); N=20 fresh-subprocess aggregation is the meaningful measurement.

Methodology (full detail in ADR-015): each scenario runs N=20 fresh Python subprocesses; per-process pytest-benchmark captures raw round timings; the orchestrator aggregates median(p99) across runs (not max-of-max). 30+ regression gates are enforced in CI on every PR via tier1.yml.

Architecture

                   ┌─────────────────────────────────────────┐
   client          │  multiple worker processes              │
   request   ─────►│  (web server / batch job / notebook)    │
                   └──────────────────┬──────────────────────┘
                                      │ assemble(seg, entity_id)  ~4 µs p99
                                      ▼
   ┌──────────────────────────────────────────────────────────────┐
   │  POSIX shared memory segment in /dev/shm                     │
   │                                                              │
   │  [16 B header][48 B metadata][slot table][string pool][rows] │
   │                                                              │
   │  Read path is allocation-free Numba-compiled copy.           │
   │  Read path NEVER touches Redis (per ADR-002).                │
   └──────────────────────────────────────────────────────────────┘
                                      ▲
                                      │ insert(seg, entity_id, row_bytes)
                                      │  (single writer; WAL consumer)
                                      │
                   ┌──────────────────┴──────────────────────┐
                   │  WAL consumer (single writer per seg)   │
                   │  reads from Redis Stream "quorin:wal"   │
                   └─────────────────────────────────────────┘
                                      ▲
                                      │ XADD (async by default;
                                      │  write_sync available)
                                      │
                   ┌──────────────────┴──────────────────────┐
                   │  WALProducer  (in user processes)       │
                   └─────────────────────────────────────────┘

   Cross-cutting:
   - Redis (control plane only): segment names, refcounts, WAL stream.
     Reads do NOT touch Redis.
   - quorin.watchdog: detects dead PIDs via heartbeat, drains cleanup queue.
   - quorin.evolution: atomic pointer flip on schema upgrade.
   - quorin.offline (Parquet): training-data store + point-in-time reads
     + Redis hydration on cold start.

Quickstart

Prereq: Redis 7.2+ on 127.0.0.1:6379. Quorin ships a docker-compose for local dev:

docker compose -f docker/docker-compose.dev.yml up -d

Then:

import redis
from quorin.schema import FeatureSchema, FeatureField, dtype
from quorin.shm import SegmentRegistry
from quorin.layout import insert, pack_row
from quorin.assembly import assemble

class UserFeatures(FeatureSchema):
    version = 1
    fields = [
        FeatureField("age_normalized", dtype.float32),
        FeatureField("session_count_7d", dtype.int32),
        FeatureField("ltv_score", dtype.float32),
    ]

r = redis.Redis(host="127.0.0.1", port=6379)
registry = SegmentRegistry(r)
seg = registry.create(UserFeatures, capacity=1000)

row = pack_row(UserFeatures, age_normalized=0.5, session_count_7d=42, ltv_score=12.3)
insert(seg, "user_001", row)

features = assemble(seg, "user_001")
print(features)            # [ 0.5 42.  12.3]
print(features.dtype)      # float32

What just happened:

Defined a schema; allocated a shared-memory segment named quorin_UserFeatures_<uuid>.
Packed one row's bytes via pack_row (kwargs API; coerces to declared dtypes).
Wrote the row via the synchronous insert path; read it back as a numpy.float32 array via assemble.
The assemble call is the headline ~4 µs p99 path on warm cache.

Production writes go through quorin.wal.WALProducer — async write to a Redis Stream + a separate WAL consumer applies it to the segment with crash-safety semantics. The synchronous insert shown here is the testing / hydration / demo path. See docs/API.md for the full surface.

Install

pip install quorin

Requires Python 3.12+, Linux or WSL2 (POSIX shared memory). Redis 7.2+ for the control plane.

Dev setup

git clone https://github.com/MahinAshraful/Quorin.git
cd Quorin
uv sync --all-extras
docker compose -f docker/docker-compose.dev.yml up -d
uv run pytest          # 758 tests, ~4 min on WSL2

FAQ

Why single-node? Single-node is the design thesis, not a limitation. The 5 µs p99 claim depends on every reader having the segment mmapped in their own address space; that breaks the moment you cross a machine boundary. Beyond ~1M entities, shard horizontally by hash(entity_id) mod N across multiple Quorin instances.

Why Linux-only? POSIX shm_open. macOS has posix_ipc support but Quorin's CI doesn't test it; native Windows is out of scope (different syscall surface — CreateFileMapping would be a separate project).

Why Redis on the control plane? Per-process refcounts, segment-name resolution, the WAL stream, watchdog heartbeats. Redis is on the control path; the read path never touches it (per ADR-002). Hot-path RPCs to Redis would blow the latency budget in a single round trip (~30-80 µs over loopback).

How does this compare to Feast? Different scope. Feast is a feature store (training + serving + lineage); Quorin is a feature server (read path only) optimized for one machine. Quorin could plug into a Feast deployment as the online-serving layer; the comparison is "Feast's online layer vs Quorin," not "Feast vs Quorin."

Does the buffer pool always help? No. Per the ADR-005 Step 16c amendment: on native CI, the pool adds +2-4 µs of latency to the single-entity assemble path. Pool wins are real but indirect (eliminates one ndarray allocation per call → less GC pressure; bounds memory ceiling) — the direct latency cost is honest and disclosed. Pool is default for the batch path (where amortization wins) and opt-in for single-entity calls.

How much faster is batch? 1.5-1.7× at N=1000 on ubuntu-latest (per ADR-007 Step 16c amendment). The original spec target was 5×; the older Xeons in GitHub Actions are cache-bound on this workload (~30 MB L3 spills to DRAM). Bare-metal modern CPUs (more L3, higher clocks) should lift the ratio meaningfully — re-measure on your own hardware.

What about late data / out-of-order writes? Append-only Parquet with event_time and processing_time columns; query by event_time for point-in-time-correct training reads. Stream-system concerns (watermarks, exactly-once across nodes) are out of scope — those belong in Kafka / Flink upstream.

Why no auth? Single-process trust model. Quorin is imported by a trusted process; if exposed over a network, that's a different project with a different security design.

Is this production-ready? v0.1.0 means "feature-complete library; 758 tests pass; 5 µs p99 substantiated on native CI; no real-world deployments yet." API may evolve based on user feedback before v1.0.0. Performance regression gates run on every PR.

Why is the codebase named quorin but the docs say Pyforge in places? Pyforge was the internal-development codename. The published package is quorin. The codename survives in the ADR archive (timestamped historical decision records — they reference the codename current at decision time, same shape as a git commit message), in CLAUDE.md (the internal Claude Code tooling document), and in git history. Functionally identical.

What's in the box

Public modules — full API surface in docs/API.md.

Module	Purpose
`quorin.schema`	`FeatureSchema`, `FeatureField`, `dtype`, `compile_schema`
`quorin.shm`	`SegmentRegistry` — POSIX shm lifecycle + Redis bookkeeping
`quorin.layout`	`insert`, `lookup`, `pack_row`, slot-table + string-pool primitives
`quorin.serving`	`assemble` — pure-Python read oracle (parity reference)
`quorin.assembly`	`assemble`, `assemble_batch` — Numba JIT read path
`quorin.pool`	`BufferPool`, `BatchBufferPool` — pre-allocated output buffers
`quorin.wal`	`WALProducer` — async writes to Redis Stream
`quorin.wal_consumer`	`WALConsumer` — applies WAL messages to the segment
`quorin.offline`	`ParquetDatasetStore` — training-data writes + point-in-time reads
`quorin.hydration`	`hydrate` — rebuild segment from Parquet on cold start
`quorin.evolution`	`upgrade_schema` — atomic schema-version flip
`quorin.watchdog`	Background process: detects dead PIDs, cleans up segments
`quorin.metrics`	Prometheus histograms + `start_metrics_server`
`quorin.logging`	structlog JSON config

License

MIT — see LICENSE.

Acknowledgments

Built on numpy, numba, pyarrow, redis-py, pydantic, posix-ipc, structlog, prometheus-client. Thanks to all upstream maintainers.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quorin-0.1.0.tar.gz (1.3 MB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quorin-0.1.0-py3-none-any.whl (148.3 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file quorin-0.1.0.tar.gz.

File metadata

Download URL: quorin-0.1.0.tar.gz
Upload date: May 6, 2026
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for quorin-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cd1c05cb06eecf7415666123fb693e40727862b3a6329f10e3e6c4f0d9a6018a`
MD5	`59b2410450c6cbbacd967eeb1360ab0e`
BLAKE2b-256	`4f494c3c280dff153be835c6fede91235bbfb1c01304a1cefa61e72077ff2c8d`

See more details on using hashes here.

File details

Details for the file quorin-0.1.0-py3-none-any.whl.

File metadata

Download URL: quorin-0.1.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 148.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for quorin-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d650d6544a029c529f43c2fda3c365479def51d7d889290c593ed373b5c361f`
MD5	`919310f035188b60979c6bda79e65a74`
BLAKE2b-256	`94297d2f8324f12287ca2c5537c866458180ca214d7d807d09d521d9c5e2b8d4`

See more details on using hashes here.

quorin 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quorin

What is this

Schema preview

Benchmarks

Architecture

Quickstart

Install

Dev setup

FAQ

What's in the box

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes