Skip to main content

MEMOPT — Universal Memory Fabric for AI Infrastructure. Open-sourced by Sophisticates (https://sophisticatesai.com).

Project description

MEMOPT — Universal Memory Fabric for AI Infrastructure

MEMOPT is the open-source universal memory fabric for AI infrastructure, built from first principles by Sophisticates — a deep tech venture company working across AI, Quantum Computing, Robotics, and Physics. MEMOPT is Sophisticates' flagship product, open-sourced under Apache-2.0 so the broader AI infrastructure community can build on, audit, and extend the hardest part of GPU serving: memory.

CI PyPI status license GPU validated python Sophisticates

⚠️ ALPHA — TEST ON YOUR OWN GPU BEFORE PRODUCTION

GPU validation has NOT been performed on this release. The 1016-passing test baseline is Mac / no-CUDA only. The 2 @gpu tests

  • 18 cuda-named tests are SKIPPED / DESELECTED on the release host.

If you are deploying to real GPUs you MUST:

  1. Run the full regression on your target GPU (A100 / H100 / L40S / ROCm) with the steps in PRODUCTION_READINESS.md.
  2. Soak-test under representative traffic for ≥ 24 hours before declaring the deployment "production-ready."
  3. Set MEMOPT_SIGNING_KEY to a high-entropy secret (NOT the default).

Do not assume "tests pass" means "works on my hardware." See the four blockers in Status — production readiness below.

A GPU memory profiling, and serving platform that turns GPU clusters into a unified memory fabric. Python control plane, C++17 data plane, optional CUDA kernels.

1016 tests pass | 7 C++ test suites | 0 failures (Mac / no CUDA)

Status — production readiness

This is an alpha release (v1.3.0a1). The library is OSS-licensed and the test baseline is green on Mac, but the following must complete before any "production-ready" claim:

  1. GPU rig validation — the 2 @gpu tests and 18 cuda-named tests are SKIPPED / DESELECTED on Mac. They must run green on an A100 / H100 host before tagging a non-alpha release.
  2. CI must pass on its first push.github/workflows/ci.yml defines a Linux + Mac × Python 3.10/3.11/3.12 matrix; nothing has run there yet.
  3. Live-workload soak — recommend ≥ 24h soak under representative traffic before declaring "production-ready."
  4. Phase B MEMOPT_USE_ORCHESTRATOR=1 — Layer 2 (orchestrator) ships in observation-only mode per docs/orchestrator_v1_design.md DECISION 7. Eviction-driving Layer 2 ships in v1.4.0, not here.

Track these in the v1.3.0 entry of CHANGELOG.md.

What It Does

memopt v1.3.0 ships two infrastructure layers plus eight product pillars.

Infrastructure layers

Layer Purpose Reference
Layer 1 — Substrate Tenant-isolated, stream-aware allocator with pluggable backends (CUDA VMM, ROCm/HIP, CXL/NUMA, CPU). Public API: memopt.alloc / free / context / stats / observe / peek_handle / MemoryHandle. docs/substrate_v1_design.md
Layer 2 — Orchestrator Tenant-aware decision pump on top of the substrate. Public API: memopt.orchestrator.start / stop / stats / register_policy. v1.0 ships in observation-only mode (DECISION 7). docs/orchestrator_v1_design.md

Pillars

# Pillar What It Solves How
1 Infinite Context VMM KV cache OOM for long contexts Multi-tier paging (HBM → DRAM → NVMe) with predictive prefetch
2 Agentic KV Memory Redundant KV recomputation across requests Content-addressed cache skips inference on exact prompt hit
3 Self-Synthesizing Kernels HBM memory stalls Detects stalls, calls Claude API, synthesizes fused Triton kernels
4 AI Compliance Ledger Energy / cost accountability + EU AI Act conformity Per-batch energy measurement, SQLite ledger, HMAC-signed entries, carbon calculator, savings/compliance reports
5 Global Unified Memory Wasted NVMe across nodes Cross-node block sharing over TCP/RDMA with lease protocol
6 Silicon Certification Hardware drift Correctness + throughput battery, drift detector, auto re-cert daemon
7 GPU FinOps Intelligence $/hour waste invisibility Per-tenant utilization → dollar tracking with auditable signed reports
8 Hardware Abstraction Multi-backend portability Unified HAL over CUDA / ROCm / Gaudi / TPU / CPU stubs

Pillars 4, 6, 7 wire to Layer 1/2 through memopt/integrations/ (attach_ledger_to_substrate, FinOpsPoller, assemble_production_receipt).

Quick Start

# Install from PyPI:
pip install memopt-engine

# After installation, the Python import path is `memopt`
# (distribution-name vs import-name — same convention as
# `pip install scikit-learn` then `import sklearn`):
python -c "import memopt; print(memopt.__version__)"

For development from source:

git clone https://github.com/basnetlachu/memopt.git
cd memopt
pip install -e ".[dev,daemon,api]"

Profile a Model

memopt profile --model gpt2 --batch-size 8

Serve with All Pillars Active

memopt-serve --model meta-llama/Llama-2-7b --port 8001

The serving engine automatically:

  • Deduplicates KV cache across requests (Pillar 2)
  • Synthesizes fused kernels on HBM stalls (Pillar 3)
  • Measures energy per token via NVML (Pillar 4)
  • Monitors hardware drift (Pillar 6)

OpenAI-Compatible API

curl http://localhost:8001/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama2", "prompt": "Hello", "max_tokens": 50}'

Check Savings

curl http://localhost:8001/report/found-capacity

Architecture

Python (control plane)          C++ (data plane)
─────────────────────          ──────────────────
vmm/                           csrc/core/
  page_table.py (shim) ──────── _memopt_core.so
  oracle.py (shim)                (64-shard page table,
  tier_manager.py                  striped-lock oracle,
  prefetch_engine.py               16-shard block directory)
  federation.py

serving/                       csrc/hooks/
  kernel_hooks.py (shim) ────── _memopt_hooks.so
  auto_optimizer.py               (FNV-1a keys, atomic
  server.py                        counters, lock-free dispatch)
  paged_attention.py (shim) ── _memopt_paged.so
                                  (block pool, CUDA gather)

cluster/                       csrc/cuda_backend/
  gkd_store.py ────────────────  _memopt_cuda.so
  transport.py (shim) ────────── memopt-transport (sidecar)
  prefix_index.py (shim) ─────  _memopt_simd.so
  block_directory.py (shim)      (AVX-512 prefix match)

Every C++ module has a Python fallback. The system runs correctly without any C++ extensions built.

Repository Structure

memopt/
├── memopt/              Python package (control plane)
│   ├── vmm/             Infinite Context VMM (Pillar 1)
│   ├── cluster/         GKD + GUM + transport (Pillars 2, 5)
│   ├── kernels/         Self-synthesizing kernels (Pillar 3)
│   ├── observability/   Energy ledger + certificates (Pillar 4)
│   ├── serving/         OpenAI-compatible HTTP server
│   ├── profiler/        Hardware counter profiling
│   ├── control_plane/   Cluster management (FastAPI)
│   ├── daemon/          Background GPU monitor
│   └── api/             REST API
├── csrc/                C++17 data plane (38 source files)
│   ├── core/            PageTable + Oracle + BlockDirectory
│   ├── hooks/           Kernel dispatch table
│   ├── paged/           Block pool + CUDA gather kernel
│   ├── cuda_backend/    Stream pool + NVMe I/O + GDS
│   ├── transport/       RDMA sidecar daemon
│   └── simd/            AVX-512 prefix matching
├── tests/cpp/           GoogleTest suites (7 files)
├── scripts/             Audit and tooling
│   └── audit_wiring.py  Runtime wiring verification
└── docs/
    ├── architecture.md  Complete technical reference
    └── rdma_deployment.md  InfiniBand deployment guide

Configuration

All behavior is configurable via environment variables. Key ones:

Variable Default Purpose
REDIS_URL Redis for cluster-wide GKD + peer discovery
MEMOPT_NODE_ID hostname Unique node identifier
MEMOPT_EVICT_HIGH 0.90 HBM eviction trigger threshold
MEMOPT_EVICT_LOW 0.75 HBM eviction target threshold
MEMOPT_GOSSIP_FANOUT 5 Peers per gossip round
MEMOPT_FETCH_RETRIES 0 Remote block fetch retry count
MEMOPT_NVME_MAX_GB 500 NVMe usage cap before eviction
MEMOPT_QP_DEBUG 0 Log RDMA QP state transitions

See docs/architecture.md Section 25 for the complete list.

Building C++ Extensions

pip install pybind11 scikit-build-core cmake ninja

# Build all extensions
cd csrc && mkdir build && cd build
cmake .. -DMEMOPT_ENABLE_TESTS=ON
make -j$(nproc)

# Run C++ tests
ctest --output-on-failure

# Optional: CUDA, RDMA, AVX-512
cmake .. -DMEMOPT_ENABLE_RDMA=ON -DMEMOPT_ENABLE_AVX512=ON

Running Tests

# Python tests (no C++ required)
pytest --tb=short -q

# Wiring audit (verifies C++ integration)
python scripts/audit_wiring.py

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • Optional: CUDA 12.4+ (GPU kernels), pynvml (power measurement), redis-py (cluster mode)

Documentation

Memory substrate

memopt v1 ships Layer 1 of the memory substrate: a tenant-isolated, stream-aware allocator with pluggable backends (CUDA VMM, ROCm/HIP stub, Level Zero stub, CXL/NUMA, CPU). The public API is memopt.alloc / free / context / stats / observe plus MemoryHandle. See docs/substrate_v1_user_guide.md for usage and docs/substrate_v1_design.md for the spec.

Orchestrator (Layer 2)

memopt v1.2 adds Layer 2: a tenant-aware observation/decision layer that sits on top of the substrate. In v1.0 (Phase A) it observes the substrate's event stream and exposes a public Policy protocol; it does NOT drive eviction yet (that ships behind MEMOPT_USE_ORCHESTRATOR=1 in Phase B). The public API is memopt.orchestrator.start / stop / stats / register_policy plus memopt.peek_handle. See docs/orchestrator_v1_user_guide.md for usage and docs/orchestrator_v1_design.md for the spec.

About MEMOPT

MEMOPT (pronounced memm-opt) is a universal memory fabric for AI infrastructure. It solves one of the hardest problems in modern GPU serving: memory — KV-cache OOM under long contexts, redundant KV recomputation across requests, HBM stalls from un-fused kernels, multi-tier paging across HBM/DRAM/NVMe, cross-node block sharing, and auditable energy/cost accountability. memopt unifies all of these behind one API, with two pinned infrastructure layers (substrate + orchestrator) and eight product pillars layered on top.

It's released under Apache-2.0 so any AI infrastructure team can read the source, audit the security guarantees, fork it, contribute back, or run it in production without licensing friction.

About Sophisticates

memopt is built and open-sourced by Sophisticates (pronounced so-phis-ti-cates), a deep tech venture company founded by Lachu Man Basnet. Sophisticates builds companies from first principles across AI, Quantum Computing, Robotics, and Physics. MEMOPT is Sophisticates' flagship product in the AI infrastructure vertical.

If your team uses memopt in production, we'd love to hear about it — open a discussion on GitHub or reach out via sophisticatesai.com.

License

Apache License 2.0. See LICENSE for the full text and NOTICE for third-party attributions. A pinned dependency license audit lives at docs/license_audit.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memopt_engine-1.3.0a2.tar.gz (558.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memopt_engine-1.3.0a2-py3-none-any.whl (639.7 kB view details)

Uploaded Python 3

File details

Details for the file memopt_engine-1.3.0a2.tar.gz.

File metadata

  • Download URL: memopt_engine-1.3.0a2.tar.gz
  • Upload date:
  • Size: 558.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memopt_engine-1.3.0a2.tar.gz
Algorithm Hash digest
SHA256 71ffcb2cb55e4741080dd5a3e7990b552d243409ad273eb8f805724bba0053de
MD5 30baa4f3fcd65af825b4887671edf522
BLAKE2b-256 589ea047b3a956648e7ae83aca64ba4db63238a523e3b521f815682345ec8c3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for memopt_engine-1.3.0a2.tar.gz:

Publisher: publish.yml on basnetlachu/memopt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file memopt_engine-1.3.0a2-py3-none-any.whl.

File metadata

  • Download URL: memopt_engine-1.3.0a2-py3-none-any.whl
  • Upload date:
  • Size: 639.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for memopt_engine-1.3.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 8f81be594379211946935be1212aaaf8fde9d2e51ef56e027787332444006e82
MD5 ae39708869ccf6de57bc3cd4b5fbc0ab
BLAKE2b-256 434eaf8cc931ff46cfaa9010f3698271e68211d0068d409f1a193d15fb8f82c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for memopt_engine-1.3.0a2-py3-none-any.whl:

Publisher: publish.yml on basnetlachu/memopt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page