Skip to main content

Unified runtime for token + tensor program execution across LLM and ML backends

Project description

Continuum

Continuum is a unified execution runtime for LLM and ML programs. It is not just an API wrapper and not just orchestration glue. Continuum executes a shared intermediate representation (IR) that spans token generation and tensor computation inside one runtime.

Why Continuum

  • One IR, two worlds: token ops and tensor ops in a single executable graph.
  • Backend-agnostic caching: reusable backend state handles enable cross-call prefix reuse without backend-specific app code.
  • Capability-driven dispatch: runtime routes ops by declared backend capability, not brittle string checks.
  • Explicit interoperability: cross-backend tensors are tagged and converted explicitly, never silently mixed.
  • Native-ready architecture: C++ core with ABI boundary prep for future dynamic backend loading.

Core Idea

Continuum uses one IR to represent both token and tensor operations, then executes that graph through a single interpreter. KV caching is treated as a program-level concern rather than a backend-specific add-on. Backends receive reusable state handles through a common contract, so cache-aware execution can remain backend-agnostic. This allows the same execution model to drive cloud LLM calls, local LLM backends, and tensor workloads.

What Works Today

  • C++ execution engine with IR + interpreter
  • KV cache index with canonical prefix normalization
  • Azure backend (real network execution)
  • libtorch backend (tensor/training execution)
  • MLX backend (native tensor op path for Apple workflows)

Example

See examples/01_research_agent.py for a paired benchmark workflow that exercises cache-aware token generation across backends.

Benchmarking Approach

Benchmarks are run as paired trials (uncached vs cached on identical input), with warmup discarded and robust statistics reported (median/p50/p95). Primary signal is token reduction (tokens_saved / (tokens_sent + tokens_saved)), with latency ratio tracked as secondary due to provider/network noise.

Status

  • v1 release hardening in progress
  • Capability-driven backend dispatch implemented (tensor/token/cache)
  • MLX + libtorch tensor interoperability implemented with explicit conversion rules
  • CIR schema lock added (schema/cir.fbs) with serialization conformance tests
  • Python + C++ API docs pipelines wired (Sphinx + Doxygen + GitHub Pages workflow)
  • Packaging migrated to continuum-ai (import path continuum) with PyPI publish workflow
  • CI matrix active on Linux + macOS with coverage gates and fuzz workflow

Star History Chart

Install

python -m pip install continuum-ai

Import remains:

import continuum

Reproducible Example Validation

PYTHONPATH=python python scripts/benchmarks/run_examples.py | python scripts/benchmarks/validate_outputs.py

Pre-commit Hooks

Set up local quality gates (ruff, formatting, YAML/whitespace checks):

pip install pre-commit
pre-commit install
pre-commit run --all-files

Note: generated docs/build outputs are excluded by default in .pre-commit-config.yaml.

API Docs

Build Python docs locally:

python -m venv .venv-docs
. .venv-docs/bin/activate
pip install sphinx furo breathe
PYTHONPATH=python sphinx-build -b html docs/api/python docs/api/python/_build

Then open:

  • docs/api/python/_build/index.html
  • GitHub Pages: https://rithulkamesh.github.io/continuum/python/

Build C++ docs locally:

doxygen Doxyfile

Then open:

  • docs/api/cpp/html/index.html
  • GitHub Pages: https://rithulkamesh.github.io/continuum/cpp/

Citation

If Continuum helps your work, cite it as:

@software{continuum2026,
  title        = {Continuum: Unified Runtime for Token and Tensor Programs},
  author       = {Kamesh, Rithul and Contributors},
  year         = {2026},
  url          = {https://github.com/rithulkamesh/continuum},
  version      = {1.0.0}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

continuum_ai-1.0.0.tar.gz (189.4 kB view details)

Uploaded Source

File details

Details for the file continuum_ai-1.0.0.tar.gz.

File metadata

  • Download URL: continuum_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 189.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for continuum_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2f913052765ca055363d47596d68883ff7cb158abba5f06fe897e51c7a41be2b
MD5 d192145342a0ba8579b6fb3e80726333
BLAKE2b-256 1bef0c7ba994becc3b2f0e327c2cf21024af6ba2b75fd9854c231df0f056f8fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for continuum_ai-1.0.0.tar.gz:

Publisher: pypi-publish.yml on rithulkamesh/continuum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page