Skip to main content

High-performance key-value storage engine with Python bindings

Project description

PegaFlow Python Package

High-performance key-value storage engine with Python bindings, built with Rust and PyO3.

Features

  • PegaEngine: Fast Rust-based key-value storage with Python bindings
  • PegaKVConnector: vLLM KV connector for distributed inference with KV cache transfer

Installation

From Source

# Install maturin if you haven't already
pip install maturin

# Build and install in development mode
cd python
maturin develop

# Or build a wheel
maturin build --release

From PyPI (coming soon)

pip install pegaflow

Usage

Basic KV Storage

from pegaflow import PegaEngine

# Create a new engine
engine = PegaEngine()

# Store key-value pairs
engine.put("name", "PegaFlow")
engine.put("version", "0.1.0")

# Retrieve values
name = engine.get("name")  # Returns "PegaFlow"
missing = engine.get("nonexistent")  # Returns None

# Remove keys
removed = engine.remove("name")  # Returns "PegaFlow"

vLLM KV Connector

from vllm import LLM
from vllm.distributed.kv_transfer.kv_transfer_agent import KVTransferConfig

# Configure vLLM to use PegaKVConnector
kv_transfer_config = KVTransferConfig(
    kv_connector="PegaKVConnector",
    kv_role="kv_both",
    kv_connector_module_path="pegaflow.connector",
)

# Create LLM with KV transfer enabled
llm = LLM(
    model="gpt2",
    kv_transfer_config=kv_transfer_config,
)

Connector Modes

PegaKVConnector defaults to read_write: it queries PegaFlow for reusable KV blocks, loads matched blocks into vLLM, and saves newly computed full blocks back to PegaFlow.

Set pegaflow.mode to save_only when another vLLM connector is responsible for reads and PegaFlow should only persist KV blocks for later reuse. This is intended for MultiConnector decode-side setups where an upstream connector owns the external hit/load path, while PegaFlow records the resulting KV cache. In save_only mode, PegaFlow does not query or load KV blocks.

vllm serve Qwen/Qwen3-0.6B \
  --kv-transfer-config '{
    "kv_connector": "MultiConnector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "connectors": [
        {
          "kv_connector": "<external-read-connector>",
          "kv_role": "kv_both"
        },
        {
          "kv_connector": "PegaKVConnector",
          "kv_role": "kv_both",
          "kv_connector_module_path": "pegaflow.connector",
          "kv_connector_extra_config": {
            "pegaflow.mode": "save_only"
          }
        }
      ]
    }
  }'

Valid values are read_write and save_only.

Development

See the examples directory for more usage examples.

Testing

Running Unit Tests

The test suite includes integration tests that verify the EngineRpcClient can correctly communicate with a running pegaflow-server instance.

Prerequisites

  1. Build the Rust extension:

    cd python
    maturin develop --release
    
  2. Build the server binary:

    cd ..
    cargo build --release --bin pegaflow-server
    
  3. Ensure CUDA is available (tests require GPU):

    python -c "import torch; assert torch.cuda.is_available()"
    

Running Tests

cd python

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_engine_client.py -v

# Run with coverage
pytest tests/ --cov=pegaflow --cov-report=html

Test Structure

  • tests/conftest.py: Contains pytest fixtures for:

    • pega_server: Automatically starts/stops pegaflow-server for integration tests
    • engine_client: Creates an EngineRpcClient connected to the test server
    • client_context: Provides a ClientContext representing a vLLM instance with GPU KV cache tensors
    • registered_instance: Provides a registered instance ID for query tests
  • tests/test_engine_client.py: Integration tests for:

    • Server connectivity
    • Query operations with various inputs

Test Fixtures

The ClientContext class abstracts a vLLM instance and provides:

  • register_kv_caches(): Register GPU KV cache tensors with the server
  • query(block_hashes): Query available blocks
  • unregister_context(): Unregister context from server

Example test usage:

def test_query(client_context):
    """Test query operation."""
    result = client_context.query([])
    assert result is not None

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pegaflow_llm_cu13-0.22.4-cp314-cp314-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.4-cp313-cp313-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.4-cp312-cp312-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.4-cp311-cp311-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.4-cp310-cp310-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file pegaflow_llm_cu13-0.22.4-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.4-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b436e632ddad89f0e44da3701e1070a2a351a18688aca01687acdaf4bfbde708
MD5 2701a1276b42e17c4158f3374c7351e1
BLAKE2b-256 caa2902031deafe27a234f8b5d7fadc032cbed2f89d0eeca724330cac8ceaf91

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2463de2a7dd5312bd16619b52362753b4c2cf0f5c597ed6e9381587cc32f7881
MD5 0dd9adb8a0085c5bd5201f90dc38e135
BLAKE2b-256 c9e3b969d1ffc4668f70e7d81ec06f67923ca47537a7c9ae85dc46c66dae4a44

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 71a32d4619f87330fe51bb5f293b50d373ace1205ce0fc67211965f112acc722
MD5 f481dc1732a2dd8fa7bf05d2d43612d0
BLAKE2b-256 3a97c67c5941d778f2dc8352d4c33607f04c7cb5a23012046d9af2b81b2c8138

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.4-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.4-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0a90ba45ad430ea054487acedee1f05e62c4e5b7c59042bb02ef3991ca756b97
MD5 35ba8fb9eec731825d3b70b7029efe53
BLAKE2b-256 e0b13851faddc60f80c927ae0f154f591e5d74f592a9c11a6d2cc4fded880f4d

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.4-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.4-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 536e5b20b835b5f62e2caf55ba781e711a39854bb4a16c632c4261104a7d0979
MD5 0c6a852577506d39e1252396c49b5a40
BLAKE2b-256 1c694e1f7ac1ac7ca89db2305b1666a6cc513a3c1a9ef956f9ef2b9c153e1bcf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page