Skip to main content

Low-latency ML feature serving via shared memory.

Project description

Quorin

Low-latency ML feature serving for one machine. ~5 µs p99 reads from shared memory.

CI Benchmark PyPI License: MIT

v0.1.0 — feature-complete; 758 tests passing; 5 µs p99 substantiated on GitHub Actions ubuntu-latest at N=20 fresh subprocesses (median_p99 = 4.48 µs for the 4-field warm assemble path; see Benchmarks).


What is this

Machine-learning serving has a structural latency floor that the model itself doesn't cause. A typical online-prediction request looks like:

fetch features from Redis  →  decode bytes  →  build Python dict  →  call model  →  return

Steps 1–3 cost 5–50 ms on a healthy box. The model's predict() call is often ~200 µs. The infrastructure around the model is the bottleneck — not the math. At 50,000 RPS that's a 250-core overhead just to shuttle bytes.

Quorin replaces the slow path with a shared-memory + precomputed-offset-table read: features live as typed bytes in a POSIX shm segment that every worker process already has mapped. A read becomes "compute the offset, copy the bytes, return a numpy.float32 array" — ~4 µs p99 on commodity hardware, zero Python object allocations, zero Redis calls on the hot path.

It is deliberately single-node. No distribution, no replication, no cross-node coordination. Beyond ~1M entities the answer is horizontal sharding by hash(entity_id) mod N across multiple Quorin instances. See FAQ for the explicit scope discipline.


Schema preview

This is what defining a feature schema looks like — pure Python, no infrastructure:

from quorin.schema import FeatureSchema, FeatureField, dtype

class UserFeatures(FeatureSchema):
    version = 1
    fields = [
        FeatureField("age_normalized", dtype.float32),
        FeatureField("session_count_7d", dtype.int32),
        FeatureField("ltv_score", dtype.float32),
        FeatureField("behavior_embedding", dtype.float32, shape=(128,)),
    ]

A FeatureSchema subclass compiles, once at process start, into a NumPy-backed offset table. Lookups are searchsorted on a sorted hash array; reads are a Numba-compiled memory copy.


Benchmarks

Numbers measured on GitHub Actions ubuntu-latest (ubuntu-24.04) via the N=20 fresh-subprocess orchestrator (benchmarks/runs/repeat.py), workflow run 25394553451, commit 4818ea4.

Scenario median p50 median p99 Spec band Source JSON
4-field warm assemble (the headline) 4.14 µs 4.48 µs ✅ ≤ 5 µs headline_4_field_warm_n20.json
200-field warm assemble 7.59 µs 11.66 µs ✅ 10–20 µs headline_200_field_warm_n20.json
200-field cold assemble 31.28 µs 66.14 µs † 20–50 µs headline_200_field_cold_n20.json
4-field assemble under GC pressure p999 22.44 µs (p999) (informational) gc_p999_pressure_n20.json
write_sync end-to-end RTT 1.93 ms 2.18 ms ≤ 75 ms gate write_sync_rtt_n20.json

† Cold-cache 66 µs p99 is over the 20–50 µs spec band by ~30% on ubuntu-latest's older Xeon CPUs (~30 MB L3 per socket). Per ADR-015 §11 bare-metal extrapolation (modern desktop CPUs are 1.5–3× faster than ubuntu-latest on this bandwidth-bound bench), the projected bare-metal range is ~22–44 µs, inside the spec band. Re-measure on your own hardware if the cold-cache number matters — single-process cold-cache p99 is heavy-tailed (3-4× run-to-run variance per ADR-015 §4); N=20 fresh-subprocess aggregation is the meaningful measurement.

Methodology (full detail in ADR-015): each scenario runs N=20 fresh Python subprocesses; per-process pytest-benchmark captures raw round timings; the orchestrator aggregates median(p99) across runs (not max-of-max). 30+ regression gates are enforced in CI on every PR via tier1.yml.


Architecture

                   ┌─────────────────────────────────────────┐
   client          │  multiple worker processes              │
   request   ─────►│  (web server / batch job / notebook)    │
                   └──────────────────┬──────────────────────┘
                                      │ assemble(seg, entity_id)  ~4 µs p99
                                      ▼
   ┌──────────────────────────────────────────────────────────────┐
   │  POSIX shared memory segment in /dev/shm                     │
   │                                                              │
   │  [16 B header][48 B metadata][slot table][string pool][rows] │
   │                                                              │
   │  Read path is allocation-free Numba-compiled copy.           │
   │  Read path NEVER touches Redis (per ADR-002).                │
   └──────────────────────────────────────────────────────────────┘
                                      ▲
                                      │ insert(seg, entity_id, row_bytes)
                                      │  (single writer; WAL consumer)
                                      │
                   ┌──────────────────┴──────────────────────┐
                   │  WAL consumer (single writer per seg)   │
                   │  reads from Redis Stream "quorin:wal"   │
                   └─────────────────────────────────────────┘
                                      ▲
                                      │ XADD (async by default;
                                      │  write_sync available)
                                      │
                   ┌──────────────────┴──────────────────────┐
                   │  WALProducer  (in user processes)       │
                   └─────────────────────────────────────────┘

   Cross-cutting:
   - Redis (control plane only): segment names, refcounts, WAL stream.
     Reads do NOT touch Redis.
   - quorin.watchdog: detects dead PIDs via heartbeat, drains cleanup queue.
   - quorin.evolution: atomic pointer flip on schema upgrade.
   - quorin.offline (Parquet): training-data store + point-in-time reads
     + Redis hydration on cold start.

Quickstart

Prereq: Redis 7.2+ on 127.0.0.1:6379. Quorin ships a docker-compose for local dev:

docker compose -f docker/docker-compose.dev.yml up -d

Then:

import redis
from quorin.schema import FeatureSchema, FeatureField, dtype
from quorin.shm import SegmentRegistry
from quorin.layout import insert, pack_row
from quorin.assembly import assemble

class UserFeatures(FeatureSchema):
    version = 1
    fields = [
        FeatureField("age_normalized", dtype.float32),
        FeatureField("session_count_7d", dtype.int32),
        FeatureField("ltv_score", dtype.float32),
    ]

r = redis.Redis(host="127.0.0.1", port=6379)
registry = SegmentRegistry(r)
seg = registry.create(UserFeatures, capacity=1000)

row = pack_row(UserFeatures, age_normalized=0.5, session_count_7d=42, ltv_score=12.3)
insert(seg, "user_001", row)

features = assemble(seg, "user_001")
print(features)            # [ 0.5 42.  12.3]
print(features.dtype)      # float32

What just happened:

  • Defined a schema; allocated a shared-memory segment named quorin_UserFeatures_<uuid>.
  • Packed one row's bytes via pack_row (kwargs API; coerces to declared dtypes).
  • Wrote the row via the synchronous insert path; read it back as a numpy.float32 array via assemble.
  • The assemble call is the headline ~4 µs p99 path on warm cache.

Production writes go through quorin.wal.WALProducer — async write to a Redis Stream + a separate WAL consumer applies it to the segment with crash-safety semantics. The synchronous insert shown here is the testing / hydration / demo path. See docs/API.md for the full surface.


Install

pip install quorin

Requires Python 3.12+, Linux or WSL2 (POSIX shared memory). Redis 7.2+ for the control plane.

Dev setup

git clone https://github.com/MahinAshraful/Quorin.git
cd Quorin
uv sync --all-extras
docker compose -f docker/docker-compose.dev.yml up -d
uv run pytest          # 758 tests, ~4 min on WSL2

FAQ

Why single-node? Single-node is the design thesis, not a limitation. The 5 µs p99 claim depends on every reader having the segment mmapped in their own address space; that breaks the moment you cross a machine boundary. Beyond ~1M entities, shard horizontally by hash(entity_id) mod N across multiple Quorin instances.

Why Linux-only? POSIX shm_open. macOS has posix_ipc support but Quorin's CI doesn't test it; native Windows is out of scope (different syscall surface — CreateFileMapping would be a separate project).

Why Redis on the control plane? Per-process refcounts, segment-name resolution, the WAL stream, watchdog heartbeats. Redis is on the control path; the read path never touches it (per ADR-002). Hot-path RPCs to Redis would blow the latency budget in a single round trip (~30-80 µs over loopback).

How does this compare to Feast? Different scope. Feast is a feature store (training + serving + lineage); Quorin is a feature server (read path only) optimized for one machine. Quorin could plug into a Feast deployment as the online-serving layer; the comparison is "Feast's online layer vs Quorin," not "Feast vs Quorin."

Does the buffer pool always help? No. Per the ADR-005 Step 16c amendment: on native CI, the pool adds +2-4 µs of latency to the single-entity assemble path. Pool wins are real but indirect (eliminates one ndarray allocation per call → less GC pressure; bounds memory ceiling) — the direct latency cost is honest and disclosed. Pool is default for the batch path (where amortization wins) and opt-in for single-entity calls.

How much faster is batch? 1.5-1.7× at N=1000 on ubuntu-latest (per ADR-007 Step 16c amendment). The original spec target was 5×; the older Xeons in GitHub Actions are cache-bound on this workload (~30 MB L3 spills to DRAM). Bare-metal modern CPUs (more L3, higher clocks) should lift the ratio meaningfully — re-measure on your own hardware.

What about late data / out-of-order writes? Append-only Parquet with event_time and processing_time columns; query by event_time for point-in-time-correct training reads. Stream-system concerns (watermarks, exactly-once across nodes) are out of scope — those belong in Kafka / Flink upstream.

Why no auth? Single-process trust model. Quorin is imported by a trusted process; if exposed over a network, that's a different project with a different security design.

Is this production-ready? v0.1.0 means "feature-complete library; 758 tests pass; 5 µs p99 substantiated on native CI; no real-world deployments yet." API may evolve based on user feedback before v1.0.0. Performance regression gates run on every PR.

Why is the codebase named quorin but the docs say Pyforge in places? Pyforge was the internal-development codename. The published package is quorin. The codename survives in the ADR archive (timestamped historical decision records — they reference the codename current at decision time, same shape as a git commit message), in CLAUDE.md (the internal Claude Code tooling document), and in git history. Functionally identical.


What's in the box

Public modules — full API surface in docs/API.md.

Module Purpose
quorin.schema FeatureSchema, FeatureField, dtype, compile_schema
quorin.shm SegmentRegistry — POSIX shm lifecycle + Redis bookkeeping
quorin.layout insert, lookup, pack_row, slot-table + string-pool primitives
quorin.serving assemble — pure-Python read oracle (parity reference)
quorin.assembly assemble, assemble_batch — Numba JIT read path
quorin.pool BufferPool, BatchBufferPool — pre-allocated output buffers
quorin.wal WALProducer — async writes to Redis Stream
quorin.wal_consumer WALConsumer — applies WAL messages to the segment
quorin.offline ParquetDatasetStore — training-data writes + point-in-time reads
quorin.hydration hydrate — rebuild segment from Parquet on cold start
quorin.evolution upgrade_schema — atomic schema-version flip
quorin.watchdog Background process: detects dead PIDs, cleans up segments
quorin.metrics Prometheus histograms + start_metrics_server
quorin.logging structlog JSON config

License

MIT — see LICENSE.

Acknowledgments

Built on numpy, numba, pyarrow, redis-py, pydantic, posix-ipc, structlog, prometheus-client. Thanks to all upstream maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quorin-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quorin-0.1.0-py3-none-any.whl (148.3 kB view details)

Uploaded Python 3

File details

Details for the file quorin-0.1.0.tar.gz.

File metadata

  • Download URL: quorin-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for quorin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cd1c05cb06eecf7415666123fb693e40727862b3a6329f10e3e6c4f0d9a6018a
MD5 59b2410450c6cbbacd967eeb1360ab0e
BLAKE2b-256 4f494c3c280dff153be835c6fede91235bbfb1c01304a1cefa61e72077ff2c8d

See more details on using hashes here.

File details

Details for the file quorin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: quorin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 148.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for quorin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d650d6544a029c529f43c2fda3c365479def51d7d889290c593ed373b5c361f
MD5 919310f035188b60979c6bda79e65a74
BLAKE2b-256 94297d2f8324f12287ca2c5537c866458180ca214d7d807d09d521d9c5e2b8d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page