Skip to main content

High-performance key-value storage engine with Python bindings

Project description

PegaFlow Python Package

High-performance key-value storage engine with Python bindings, built with Rust and PyO3.

Features

  • PegaEngine: Fast Rust-based key-value storage with Python bindings
  • PegaKVConnector: vLLM KV connector for distributed inference with KV cache transfer

Installation

From Source

# Install maturin if you haven't already
pip install maturin

# Build and install in development mode
cd python
maturin develop

# Or build a wheel
maturin build --release

From PyPI (coming soon)

pip install pegaflow

Usage

Basic KV Storage

from pegaflow import PegaEngine

# Create a new engine
engine = PegaEngine()

# Store key-value pairs
engine.put("name", "PegaFlow")
engine.put("version", "0.1.0")

# Retrieve values
name = engine.get("name")  # Returns "PegaFlow"
missing = engine.get("nonexistent")  # Returns None

# Remove keys
removed = engine.remove("name")  # Returns "PegaFlow"

vLLM KV Connector

from vllm import LLM
from vllm.distributed.kv_transfer.kv_transfer_agent import KVTransferConfig

# Configure vLLM to use PegaKVConnector
kv_transfer_config = KVTransferConfig(
    kv_connector="PegaKVConnector",
    kv_role="kv_both",
    kv_connector_module_path="pegaflow.connector",
)

# Create LLM with KV transfer enabled
llm = LLM(
    model="gpt2",
    kv_transfer_config=kv_transfer_config,
)

Connector Modes

PegaKVConnector defaults to read_write: it queries PegaFlow for reusable KV blocks, loads matched blocks into vLLM, and saves newly computed full blocks back to PegaFlow.

Set pegaflow.mode to save_only when another vLLM connector is responsible for reads and PegaFlow should only persist KV blocks for later reuse. This is intended for MultiConnector decode-side setups where an upstream connector owns the external hit/load path, while PegaFlow records the resulting KV cache. In save_only mode, PegaFlow does not query or load KV blocks.

vllm serve Qwen/Qwen3-0.6B \
  --kv-transfer-config '{
    "kv_connector": "MultiConnector",
    "kv_role": "kv_both",
    "kv_connector_extra_config": {
      "connectors": [
        {
          "kv_connector": "<external-read-connector>",
          "kv_role": "kv_both"
        },
        {
          "kv_connector": "PegaKVConnector",
          "kv_role": "kv_both",
          "kv_connector_module_path": "pegaflow.connector",
          "kv_connector_extra_config": {
            "pegaflow.mode": "save_only"
          }
        }
      ]
    }
  }'

Valid values are read_write and save_only.

Development

See the examples directory for more usage examples.

Testing

Running Unit Tests

The test suite includes integration tests that verify the EngineRpcClient can correctly communicate with a running pegaflow-server instance.

Prerequisites

  1. Build the Rust extension:

    cd python
    maturin develop --release
    
  2. Build the server binary:

    cd ..
    cargo build --release --bin pegaflow-server
    
  3. Ensure CUDA is available (tests require GPU):

    python -c "import torch; assert torch.cuda.is_available()"
    

Running Tests

cd python

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_engine_client.py -v

# Run with coverage
pytest tests/ --cov=pegaflow --cov-report=html

Test Structure

  • tests/conftest.py: Contains pytest fixtures for:

    • pega_server: Automatically starts/stops pegaflow-server for integration tests
    • engine_client: Creates an EngineRpcClient connected to the test server
    • client_context: Provides a ClientContext representing a vLLM instance with GPU KV cache tensors
    • registered_instance: Provides a registered instance ID for query tests
  • tests/test_engine_client.py: Integration tests for:

    • Server connectivity
    • Query operations with various inputs

Test Fixtures

The ClientContext class abstracts a vLLM instance and provides:

  • register_kv_caches(): Register GPU KV cache tensors with the server
  • query(block_hashes): Query available blocks
  • unregister_context(): Unregister context from server

Example test usage:

def test_query(client_context):
    """Test query operation."""
    result = client_context.query([])
    assert result is not None

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pegaflow_llm-0.22.4-cp314-cp314-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

pegaflow_llm-0.22.4-cp313-cp313-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

pegaflow_llm-0.22.4-cp312-cp312-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

pegaflow_llm-0.22.4-cp311-cp311-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

pegaflow_llm-0.22.4-cp310-cp310-manylinux_2_34_x86_64.whl (9.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file pegaflow_llm-0.22.4-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm-0.22.4-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a614433a14f1aebc6b73621350e20581d3fb7cb27e5f379d7609f5b606c4ba9d
MD5 3a066f8ff5871d234fb017398fc40943
BLAKE2b-256 5047d85f581686f2149f9f365c68c0cd3f58f28280d7f6c7b073eb72e4dd0b22

See more details on using hashes here.

File details

Details for the file pegaflow_llm-0.22.4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm-0.22.4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d98edcda89220c2cb0799faf0ad5b74a08284b566b39f0745feea566c09b530f
MD5 b420f47c9a4a481d7db1b552a3892e40
BLAKE2b-256 6c40eed0431afd289a9e2409873a38e20311653cd77313261f8af1d9b7d79b0e

See more details on using hashes here.

File details

Details for the file pegaflow_llm-0.22.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm-0.22.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 90c4eb2cd6b26b662a9bffc6a8118fc555408f4688a2852669c98c702022a447
MD5 d6bbd3d4c2a8ef14ef4bad4b86db5952
BLAKE2b-256 e3f2f1f45f786a7a8290a6efe5a19b8106f3e4680396e554c78e6b369a5e4976

See more details on using hashes here.

File details

Details for the file pegaflow_llm-0.22.4-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm-0.22.4-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f4390a1c8cef51257f2ca074be8ecbd0c1018f19a72d887b7a1f70eac8b42be2
MD5 e3c2a5433a4494fbe00b29d1508dbc0a
BLAKE2b-256 71545f81b7a5ce7a62f4dbb829f8cd4604cc89128c87887d6ffea9570fa52b21

See more details on using hashes here.

File details

Details for the file pegaflow_llm-0.22.4-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pegaflow_llm-0.22.4-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6d0d768e28f2d31c46e32ec7b1bc7a325f977e5d2e6a51861cbe670ced529568
MD5 c64a3e1fa0a89e25494604e93e219458
BLAKE2b-256 3501feee38c5b5823c3a3f0e7603c0803c579ddf1cfdff95ad15c701769a8bcf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page