MEMOPT — Universal Memory Fabric for AI Infrastructure. Open-sourced by Sophisticates (https://sophisticatesai.com).
Project description
MEMOPT — Universal Memory Fabric for AI Infrastructure
MEMOPT is the open-source universal memory fabric for AI infrastructure, built from first principles by Sophisticates — a deep tech venture company working across AI, Quantum Computing, Robotics, and Physics. MEMOPT is Sophisticates' flagship product, open-sourced under Apache-2.0 so the broader AI infrastructure community can build on, audit, and extend the hardest part of GPU serving: memory.
⚠️ ALPHA — TEST ON YOUR OWN GPU BEFORE PRODUCTION
GPU validation has NOT been performed on this release. The 1016-passing test baseline is Mac / no-CUDA only. The 2
@gputests
- 18 cuda-named tests are SKIPPED / DESELECTED on the release host.
If you are deploying to real GPUs you MUST:
- Run the full regression on your target GPU (A100 / H100 / L40S / ROCm) with the steps in
PRODUCTION_READINESS.md.- Soak-test under representative traffic for ≥ 24 hours before declaring the deployment "production-ready."
- Set
MEMOPT_SIGNING_KEYto a high-entropy secret (NOT the default).Do not assume "tests pass" means "works on my hardware." See the four blockers in Status — production readiness below.
A GPU memory profiling, and serving platform that turns GPU clusters into a unified memory fabric. Python control plane, C++17 data plane, optional CUDA kernels.
1016 tests pass | 7 C++ test suites | 0 failures (Mac / no CUDA)
Status — production readiness
This is an alpha release (v1.3.0a1). The library is OSS-licensed
and the test baseline is green on Mac, but the following must complete
before any "production-ready" claim:
- GPU rig validation — the 2
@gputests and 18 cuda-named tests are SKIPPED / DESELECTED on Mac. They must run green on an A100 / H100 host before tagging a non-alpha release. - CI must pass on its first push —
.github/workflows/ci.ymldefines a Linux + Mac × Python 3.10/3.11/3.12 matrix; nothing has run there yet. - Live-workload soak — recommend ≥ 24h soak under representative traffic before declaring "production-ready."
- Phase B
MEMOPT_USE_ORCHESTRATOR=1— Layer 2 (orchestrator) ships in observation-only mode perdocs/orchestrator_v1_design.mdDECISION 7. Eviction-driving Layer 2 ships inv1.4.0, not here.
Track these in the v1.3.0 entry of CHANGELOG.md.
What It Does
memopt v1.3.0 ships two infrastructure layers plus eight product pillars.
Infrastructure layers
| Layer | Purpose | Reference |
|---|---|---|
| Layer 1 — Substrate | Tenant-isolated, stream-aware allocator with pluggable backends (CUDA VMM, ROCm/HIP, CXL/NUMA, CPU). Public API: memopt.alloc / free / context / stats / observe / peek_handle / MemoryHandle. |
docs/substrate_v1_design.md |
| Layer 2 — Orchestrator | Tenant-aware decision pump on top of the substrate. Public API: memopt.orchestrator.start / stop / stats / register_policy. v1.0 ships in observation-only mode (DECISION 7). |
docs/orchestrator_v1_design.md |
Pillars
| # | Pillar | What It Solves | How |
|---|---|---|---|
| 1 | Infinite Context VMM | KV cache OOM for long contexts | Multi-tier paging (HBM → DRAM → NVMe) with predictive prefetch |
| 2 | Agentic KV Memory | Redundant KV recomputation across requests | Content-addressed cache skips inference on exact prompt hit |
| 3 | Self-Synthesizing Kernels | HBM memory stalls | Detects stalls, calls Claude API, synthesizes fused Triton kernels |
| 4 | AI Compliance Ledger | Energy / cost accountability + EU AI Act conformity | Per-batch energy measurement, SQLite ledger, HMAC-signed entries, carbon calculator, savings/compliance reports |
| 5 | Global Unified Memory | Wasted NVMe across nodes | Cross-node block sharing over TCP/RDMA with lease protocol |
| 6 | Silicon Certification | Hardware drift | Correctness + throughput battery, drift detector, auto re-cert daemon |
| 7 | GPU FinOps Intelligence | $/hour waste invisibility | Per-tenant utilization → dollar tracking with auditable signed reports |
| 8 | Hardware Abstraction | Multi-backend portability | Unified HAL over CUDA / ROCm / Gaudi / TPU / CPU stubs |
Pillars 4, 6, 7 wire to Layer 1/2 through memopt/integrations/
(attach_ledger_to_substrate, FinOpsPoller,
assemble_production_receipt).
Quick Start
# Install from PyPI:
pip install memopt-engine
# After installation, the Python import path is `memopt`
# (distribution-name vs import-name — same convention as
# `pip install scikit-learn` then `import sklearn`):
python -c "import memopt; print(memopt.__version__)"
For development from source:
git clone https://github.com/basnetlachu/memopt.git
cd memopt
pip install -e ".[dev,daemon,api]"
Profile a Model
memopt profile --model gpt2 --batch-size 8
Serve with All Pillars Active
memopt-serve --model meta-llama/Llama-2-7b --port 8001
The serving engine automatically:
- Deduplicates KV cache across requests (Pillar 2)
- Synthesizes fused kernels on HBM stalls (Pillar 3)
- Measures energy per token via NVML (Pillar 4)
- Monitors hardware drift (Pillar 6)
OpenAI-Compatible API
curl http://localhost:8001/v1/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama2", "prompt": "Hello", "max_tokens": 50}'
Check Savings
curl http://localhost:8001/report/found-capacity
Architecture
Python (control plane) C++ (data plane)
───────────────────── ──────────────────
vmm/ csrc/core/
page_table.py (shim) ──────── _memopt_core.so
oracle.py (shim) (64-shard page table,
tier_manager.py striped-lock oracle,
prefetch_engine.py 16-shard block directory)
federation.py
serving/ csrc/hooks/
kernel_hooks.py (shim) ────── _memopt_hooks.so
auto_optimizer.py (FNV-1a keys, atomic
server.py counters, lock-free dispatch)
paged_attention.py (shim) ── _memopt_paged.so
(block pool, CUDA gather)
cluster/ csrc/cuda_backend/
gkd_store.py ──────────────── _memopt_cuda.so
transport.py (shim) ────────── memopt-transport (sidecar)
prefix_index.py (shim) ───── _memopt_simd.so
block_directory.py (shim) (AVX-512 prefix match)
Every C++ module has a Python fallback. The system runs correctly without any C++ extensions built.
Repository Structure
memopt/
├── memopt/ Python package (control plane)
│ ├── vmm/ Infinite Context VMM (Pillar 1)
│ ├── cluster/ GKD + GUM + transport (Pillars 2, 5)
│ ├── kernels/ Self-synthesizing kernels (Pillar 3)
│ ├── observability/ Energy ledger + certificates (Pillar 4)
│ ├── serving/ OpenAI-compatible HTTP server
│ ├── profiler/ Hardware counter profiling
│ ├── control_plane/ Cluster management (FastAPI)
│ ├── daemon/ Background GPU monitor
│ └── api/ REST API
├── csrc/ C++17 data plane (38 source files)
│ ├── core/ PageTable + Oracle + BlockDirectory
│ ├── hooks/ Kernel dispatch table
│ ├── paged/ Block pool + CUDA gather kernel
│ ├── cuda_backend/ Stream pool + NVMe I/O + GDS
│ ├── transport/ RDMA sidecar daemon
│ └── simd/ AVX-512 prefix matching
├── tests/cpp/ GoogleTest suites (7 files)
├── scripts/ Audit and tooling
│ └── audit_wiring.py Runtime wiring verification
└── docs/
├── architecture.md Complete technical reference
└── rdma_deployment.md InfiniBand deployment guide
Configuration
All behavior is configurable via environment variables. Key ones:
| Variable | Default | Purpose |
|---|---|---|
REDIS_URL |
— | Redis for cluster-wide GKD + peer discovery |
MEMOPT_NODE_ID |
hostname | Unique node identifier |
MEMOPT_EVICT_HIGH |
0.90 | HBM eviction trigger threshold |
MEMOPT_EVICT_LOW |
0.75 | HBM eviction target threshold |
MEMOPT_GOSSIP_FANOUT |
5 | Peers per gossip round |
MEMOPT_FETCH_RETRIES |
0 | Remote block fetch retry count |
MEMOPT_NVME_MAX_GB |
500 | NVMe usage cap before eviction |
MEMOPT_QP_DEBUG |
0 | Log RDMA QP state transitions |
See docs/architecture.md Section 25 for the complete list.
Building C++ Extensions
pip install pybind11 scikit-build-core cmake ninja
# Build all extensions
cd csrc && mkdir build && cd build
cmake .. -DMEMOPT_ENABLE_TESTS=ON
make -j$(nproc)
# Run C++ tests
ctest --output-on-failure
# Optional: CUDA, RDMA, AVX-512
cmake .. -DMEMOPT_ENABLE_RDMA=ON -DMEMOPT_ENABLE_AVX512=ON
Running Tests
# Python tests (no C++ required)
pytest --tb=short -q
# Wiring audit (verifies C++ integration)
python scripts/audit_wiring.py
Requirements
- Python 3.10+
- PyTorch 2.0+
- Optional: CUDA 12.4+ (GPU kernels), pynvml (power measurement), redis-py (cluster mode)
Documentation
- Architecture Reference — complete technical specification
- RDMA Deployment Guide — InfiniBand setup and troubleshooting
Memory substrate
memopt v1 ships Layer 1 of the memory substrate: a tenant-isolated,
stream-aware allocator with pluggable backends (CUDA VMM, ROCm/HIP
stub, Level Zero stub, CXL/NUMA, CPU). The public API is memopt.alloc / free / context / stats / observe plus MemoryHandle. See
docs/substrate_v1_user_guide.md for
usage and docs/substrate_v1_design.md
for the spec.
Orchestrator (Layer 2)
memopt v1.2 adds Layer 2: a tenant-aware observation/decision layer
that sits on top of the substrate. In v1.0 (Phase A) it observes the
substrate's event stream and exposes a public Policy protocol; it does
NOT drive eviction yet (that ships behind MEMOPT_USE_ORCHESTRATOR=1
in Phase B). The public API is memopt.orchestrator.start / stop / stats / register_policy plus memopt.peek_handle. See
docs/orchestrator_v1_user_guide.md
for usage and
docs/orchestrator_v1_design.md for
the spec.
About MEMOPT
MEMOPT (pronounced memm-opt) is a universal memory fabric for AI infrastructure. It solves one of the hardest problems in modern GPU serving: memory — KV-cache OOM under long contexts, redundant KV recomputation across requests, HBM stalls from un-fused kernels, multi-tier paging across HBM/DRAM/NVMe, cross-node block sharing, and auditable energy/cost accountability. memopt unifies all of these behind one API, with two pinned infrastructure layers (substrate + orchestrator) and eight product pillars layered on top.
It's released under Apache-2.0 so any AI infrastructure team can read the source, audit the security guarantees, fork it, contribute back, or run it in production without licensing friction.
About Sophisticates
memopt is built and open-sourced by Sophisticates (pronounced so-phis-ti-cates), a deep tech venture company founded by Lachu Man Basnet. Sophisticates builds companies from first principles across AI, Quantum Computing, Robotics, and Physics. MEMOPT is Sophisticates' flagship product in the AI infrastructure vertical.
- Website: sophisticatesai.com
- Maintainer: Lachu Man Basnet (
lachu.basnet@sophisticatesai.com) - Issues / discussions: https://github.com/basnetlachu/memopt/issues
- Security disclosures: see CONTRIBUTING.md § Security
If your team uses memopt in production, we'd love to hear about it — open a discussion on GitHub or reach out via sophisticatesai.com.
License
Apache License 2.0. See LICENSE for the full text and NOTICE for third-party attributions. A pinned dependency license audit lives at docs/license_audit.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memopt_engine-1.3.0a2.tar.gz.
File metadata
- Download URL: memopt_engine-1.3.0a2.tar.gz
- Upload date:
- Size: 558.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71ffcb2cb55e4741080dd5a3e7990b552d243409ad273eb8f805724bba0053de
|
|
| MD5 |
30baa4f3fcd65af825b4887671edf522
|
|
| BLAKE2b-256 |
589ea047b3a956648e7ae83aca64ba4db63238a523e3b521f815682345ec8c3d
|
Provenance
The following attestation bundles were made for memopt_engine-1.3.0a2.tar.gz:
Publisher:
publish.yml on basnetlachu/memopt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
memopt_engine-1.3.0a2.tar.gz -
Subject digest:
71ffcb2cb55e4741080dd5a3e7990b552d243409ad273eb8f805724bba0053de - Sigstore transparency entry: 1439267054
- Sigstore integration time:
-
Permalink:
basnetlachu/memopt@066c33b8eaf0caf3eccd26607ba53f1e7bc5968c -
Branch / Tag:
refs/tags/v1.3.0a2 - Owner: https://github.com/basnetlachu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@066c33b8eaf0caf3eccd26607ba53f1e7bc5968c -
Trigger Event:
push
-
Statement type:
File details
Details for the file memopt_engine-1.3.0a2-py3-none-any.whl.
File metadata
- Download URL: memopt_engine-1.3.0a2-py3-none-any.whl
- Upload date:
- Size: 639.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f81be594379211946935be1212aaaf8fde9d2e51ef56e027787332444006e82
|
|
| MD5 |
ae39708869ccf6de57bc3cd4b5fbc0ab
|
|
| BLAKE2b-256 |
434eaf8cc931ff46cfaa9010f3698271e68211d0068d409f1a193d15fb8f82c4
|
Provenance
The following attestation bundles were made for memopt_engine-1.3.0a2-py3-none-any.whl:
Publisher:
publish.yml on basnetlachu/memopt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
memopt_engine-1.3.0a2-py3-none-any.whl -
Subject digest:
8f81be594379211946935be1212aaaf8fde9d2e51ef56e027787332444006e82 - Sigstore transparency entry: 1439267065
- Sigstore integration time:
-
Permalink:
basnetlachu/memopt@066c33b8eaf0caf3eccd26607ba53f1e7bc5968c -
Branch / Tag:
refs/tags/v1.3.0a2 - Owner: https://github.com/basnetlachu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@066c33b8eaf0caf3eccd26607ba53f1e7bc5968c -
Trigger Event:
push
-
Statement type: