Skip to main content

ProcessFork plugin for vLLM ≥0.10 — paged-KV-cache snapshot/restore via batch-invariant kernels.

Project description

processfork-vllm

ProcessFork plugin for vLLM ≥0.10. Adds OpenAI-compatible extended endpoints for snapshot / fork / checkout that walk vLLM's paged KV cache via the batch-invariant kernel mode.

Install

pip install "processfork-vllm[vllm]"

Use

vllm serve meta-llama/Llama-3-8B \
  --enforce-deterministic \
  --plugin processfork

Then:

POST /v1/processfork/snapshot       { "name": "..." }
  → { "cid": "sha256:..." }

POST /v1/processfork/fork           { "cid": "...", "n": 12 }
  → { "cids": ["sha256:..."] }

POST /v1/processfork/checkout       { "cid": "..." }
  → { "ok": true }

Bit-exact restore requires --enforce-deterministic (stable since vLLM 0.10). Without it, restore produces logits within ≤1e-4 of the originals.

The wire format matches agent_docs/cache-layer.mdpaged-batchinvariant-v1. K and V pages are content-addressed independently so a fork that mutates only V (one-token decode) shares its K page with siblings.

Status

The trait surface and the paged-batchinvariant-v1 wire format are stable. The live FFI shim into vllm.worker.cache_engine lands in v1.0.1. Until then, the plugin's HTTP surface returns 501 Not Implemented with a clear pointer to this README.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

processfork_vllm-1.0.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file processfork_vllm-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for processfork_vllm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69228d34b1dc317a061b0679e52506c0dfa1d5996294aee776564c0b5ff6f5cb
MD5 87adcfef6cc0eb942693e9a1a34b83ea
BLAKE2b-256 df9e7f4e4bbf87e8c44bb6331160b89ab9f703b4bfe3e029c85bc5852cb9af3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page