Skip to main content

High-performance key-value storage engine with Python bindings

Project description

PegaFlow Python Package

High-performance key-value storage engine with Python bindings, built with Rust and PyO3.

Features

  • PegaEngine: Fast Rust-based key-value storage with Python bindings
  • PegaKVConnector: vLLM KV connector for distributed inference with KV cache transfer

Installation

From Source

# Install maturin if you haven't already
pip install maturin

# Build and install in development mode
cd python
maturin develop

# Or build a wheel
maturin build --release

From PyPI (coming soon)

pip install pegaflow

Usage

Basic KV Storage

from pegaflow import PegaEngine

# Create a new engine
engine = PegaEngine()

# Store key-value pairs
engine.put("name", "PegaFlow")
engine.put("version", "0.1.0")

# Retrieve values
name = engine.get("name")  # Returns "PegaFlow"
missing = engine.get("nonexistent")  # Returns None

# Remove keys
removed = engine.remove("name")  # Returns "PegaFlow"

vLLM KV Connector

from vllm import LLM
from vllm.distributed.kv_transfer.kv_transfer_agent import KVTransferConfig

# Configure vLLM to use PegaKVConnector
kv_transfer_config = KVTransferConfig(
    kv_connector="PegaKVConnector",
    kv_role="kv_both",
    kv_connector_module_path="pegaflow.connector",
)

# Create LLM with KV transfer enabled
llm = LLM(
    model="gpt2",
    kv_transfer_config=kv_transfer_config,
)

Connector Modes

PegaKVConnector defaults to read_write: it queries PegaFlow for reusable KV blocks, loads matched blocks into vLLM, and saves newly computed full blocks back to PegaFlow.

Set pegaflow.mode to save_only when another vLLM connector is responsible for reads and PegaFlow should only persist KV blocks for later reuse. This is intended for MultiConnector decode-side setups where an upstream connector owns the external hit/load path, while PegaFlow records the resulting KV cache. In save_only mode, PegaFlow does not query or load KV blocks.

vllm serve Qwen/Qwen3-0.6B \
  --kv-transfer-config '{
    "kv_connector": "MultiConnector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "connectors": [
        {
          "kv_connector": "<external-read-connector>",
          "kv_role": "kv_both"
        },
        {
          "kv_connector": "PegaKVConnector",
          "kv_role": "kv_both",
          "kv_connector_module_path": "pegaflow.connector",
          "kv_connector_extra_config": {
            "pegaflow.mode": "save_only"
          }
        }
      ]
    }
  }'

Valid values are read_write and save_only.

Development

See the examples directory for more usage examples.

Testing

Running Unit Tests

The test suite includes integration tests that verify the EngineRpcClient can correctly communicate with a running pegaflow-server instance.

Prerequisites

  1. Build the Rust extension:

    cd python
    maturin develop --release
    
  2. Build the server binary:

    cd ..
    cargo build --release --bin pegaflow-server
    
  3. Ensure CUDA is available (tests require GPU):

    python -c "import torch; assert torch.cuda.is_available()"
    

Running Tests

cd python

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_engine_client.py -v

# Run with coverage
pytest tests/ --cov=pegaflow --cov-report=html

Test Structure

  • tests/conftest.py: Contains pytest fixtures for:

    • pega_server: Automatically starts/stops pegaflow-server for integration tests
    • engine_client: Creates an EngineRpcClient connected to the test server
    • client_context: Provides a ClientContext representing a vLLM instance with GPU KV cache tensors
    • registered_instance: Provides a registered instance ID for query tests
  • tests/test_engine_client.py: Integration tests for:

    • Server connectivity
    • Query operations with various inputs

Test Fixtures

The ClientContext class abstracts a vLLM instance and provides:

  • register_kv_caches(): Register GPU KV cache tensors with the server
  • query(block_hashes): Query available blocks
  • unregister_context(): Unregister context from server

Example test usage:

def test_query(client_context):
    """Test query operation."""
    result = client_context.query([])
    assert result is not None

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pegaflow_llm_cu13-0.22.5-cp314-cp314-manylinux_2_34_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.5-cp313-cp313-manylinux_2_34_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.5-cp312-cp312-manylinux_2_34_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.5-cp311-cp311-manylinux_2_34_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

pegaflow_llm_cu13-0.22.5-cp310-cp310-manylinux_2_34_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file pegaflow_llm_cu13-0.22.5-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.5-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 227d39cb5b21f1f6da20839a0098ee6b0766ab8f94e21c1f595a7cc1f71e9b11
MD5 489315de17fe9185f656feb3cc22e8d7
BLAKE2b-256 85af84843edf76107c40652db34ae9f48592ba69c020902d7caa57bc197d1154

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.5-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.5-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cd263b70852f9bfde580aac17ae89ceb3a623f7e632216a1e597491e4507af84
MD5 21ade0ae06dd497ea629411f2b32efd7
BLAKE2b-256 a53876214b99ecd498f1746092eb6cea96f12ec22b887d55a2ded9a404663044

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.5-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.5-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 95a2fa223ea2751897e1017983ce009c81ec71a21a2b5f30e9d82e74d3ce57a1
MD5 cf12dad3704e90d1d18b137f35896808
BLAKE2b-256 d191171132e9967932a05a59cacb6186b2aa7739afc4d4c398aa2dcfc37ed08d

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.5-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.5-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b840d867ecf610ad57326f4400c002ed72ba6cfcdf33b7ce7ed5b16a92f40a6e
MD5 626ac8bd12327f5792fbafc668d5a46f
BLAKE2b-256 c630639bef49ae9e9e55fea5d43e77655139961f4556e1919b27c8663c3a78a2

See more details on using hashes here.

File details

Details for the file pegaflow_llm_cu13-0.22.5-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm_cu13-0.22.5-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 18b8119640ad1f84874806f1d676c51af3c37151290b67a16dbcd25c86c7d504
MD5 30853d626c91507efedd8b7e0140dd20
BLAKE2b-256 7f3be237d56663c12d51858d61301b19757de4a6e05e8f63d7c6dc4f813f89f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page