Unified runtime for token + tensor program execution across LLM and ML backends
Project description
Continuum
Continuum is a unified execution runtime for LLM and ML programs. It is not just an API wrapper and not just orchestration glue. Continuum executes a shared intermediate representation (IR) that spans token generation and tensor computation inside one runtime.
Why Continuum
- One IR, two worlds: token ops and tensor ops in a single executable graph.
- Backend-agnostic caching: reusable backend state handles enable cross-call prefix reuse without backend-specific app code.
- Capability-driven dispatch: runtime routes ops by declared backend capability, not brittle string checks.
- Explicit interoperability: cross-backend tensors are tagged and converted explicitly, never silently mixed.
- Native-ready architecture: C++ core with ABI boundary prep for future dynamic backend loading.
Core Idea
Continuum uses one IR to represent both token and tensor operations, then executes that graph through a single interpreter. KV caching is treated as a program-level concern rather than a backend-specific add-on. Backends receive reusable state handles through a common contract, so cache-aware execution can remain backend-agnostic. This allows the same execution model to drive cloud LLM calls, local LLM backends, and tensor workloads.
What Works Today
- C++ execution engine with IR + interpreter
- KV cache index with canonical prefix normalization
- Azure backend (real network execution)
- libtorch backend (tensor/training execution)
- MLX backend (native tensor op path for Apple workflows)
Example
See examples/01_research_agent.py for a paired benchmark workflow that exercises cache-aware token generation across backends.
Benchmarking Approach
Benchmarks are run as paired trials (uncached vs cached on identical input), with warmup discarded and robust statistics reported (median/p50/p95).
Primary signal is token reduction (tokens_saved / (tokens_sent + tokens_saved)), with latency ratio tracked as secondary due to provider/network noise.
Status
- v1 release hardening in progress
- Capability-driven backend dispatch implemented (tensor/token/cache)
- MLX + libtorch tensor interoperability implemented with explicit conversion rules
- CIR schema lock added (
schema/cir.fbs) with serialization conformance tests - Python + C++ API docs pipelines wired (Sphinx + Doxygen + GitHub Pages workflow)
- Packaging migrated to
continuum-ai(import pathcontinuum) with PyPI publish workflow - CI matrix active on Linux + macOS with coverage gates and fuzz workflow
Install
python -m pip install continuum-ai
Import remains:
import continuum
Reproducible Example Validation
PYTHONPATH=python python scripts/benchmarks/run_examples.py | python scripts/benchmarks/validate_outputs.py
Pre-commit Hooks
Set up local quality gates (ruff, formatting, YAML/whitespace checks):
pip install pre-commit
pre-commit install
pre-commit run --all-files
Note: generated docs/build outputs are excluded by default in .pre-commit-config.yaml.
API Docs
Build Python docs locally:
python -m venv .venv-docs
. .venv-docs/bin/activate
pip install sphinx furo breathe
PYTHONPATH=python sphinx-build -b html docs/api/python docs/api/python/_build
Then open:
docs/api/python/_build/index.html- GitHub Pages:
https://rithulkamesh.github.io/continuum/python/
Build C++ docs locally:
doxygen Doxyfile
Then open:
docs/api/cpp/html/index.html- GitHub Pages:
https://rithulkamesh.github.io/continuum/cpp/
Citation
If Continuum helps your work, cite it as:
@software{continuum2026,
title = {Continuum: Unified Runtime for Token and Tensor Programs},
author = {Kamesh, Rithul and Contributors},
year = {2026},
url = {https://github.com/rithulkamesh/continuum},
version = {1.0.0}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file continuum_ai-1.0.0.tar.gz.
File metadata
- Download URL: continuum_ai-1.0.0.tar.gz
- Upload date:
- Size: 189.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f913052765ca055363d47596d68883ff7cb158abba5f06fe897e51c7a41be2b
|
|
| MD5 |
d192145342a0ba8579b6fb3e80726333
|
|
| BLAKE2b-256 |
1bef0c7ba994becc3b2f0e327c2cf21024af6ba2b75fd9854c231df0f056f8fb
|
Provenance
The following attestation bundles were made for continuum_ai-1.0.0.tar.gz:
Publisher:
pypi-publish.yml on rithulkamesh/continuum
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
continuum_ai-1.0.0.tar.gz -
Subject digest:
2f913052765ca055363d47596d68883ff7cb158abba5f06fe897e51c7a41be2b - Sigstore transparency entry: 1392525255
- Sigstore integration time:
-
Permalink:
rithulkamesh/continuum@53b320c5ec615f4d4069ba57edbf117b3313e1d0 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/rithulkamesh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@53b320c5ec615f4d4069ba57edbf117b3313e1d0 -
Trigger Event:
workflow_dispatch
-
Statement type: