Skip to main content

Lightweight Prefill-Decode proxy for disaggregated LLM serving

Project description

MicroPDProxy

MicroPDProxyServer – a lightweight PD (Prefill-Decode) proxy implementation.

This project provides dummy prefill and decode nodes for local development and debugging of a PD-separated proxy without any GPU or model dependencies.

The dummy nodes expose the minimum compatibility surface required by the validated proxy implementation under core/, including:

  • /v1/models
  • /v1/completions
  • /v1/chat/completions
  • /health
  • /ping
  • /metrics (Prometheus format)

Architecture

MicroPDProxy implements a Prefill-Decode (PD) separated serving architecture. Incoming requests are routed through two phases:

  1. Prefill — sent to a prefill node for KV cache preparation (max_tokens=1, stream=False)
  2. Decode — forwarded to a decode node for autoregressive token generation

The proxy handles scheduling (Round Robin or Load Balanced), health monitoring, and dynamic instance management. See docs/architecture.md for the full architecture overview.

Quick Start

# Install as a CLI tool
pip install .

# Or install in dev mode
pip install -e .

# Start with a YAML config
xpyd --config examples/proxy.yaml

# Or use the traditional way
pip install -r requirements.txt
python core/MicroPDProxyServer.py --config examples/proxy.yaml

Installation

# Install the xpyd CLI
pip install .

# Verify
xpyd --version
xpyd --help

# Validate a config without starting the server
xpyd --validate-config examples/proxy.yaml

Usage

Option 1: YAML Configuration (recommended)

Create a YAML config file (see examples/proxy.yaml):

model: /path/to/model
port: 8868

prefill:
  nodes:
    - "10.0.0.1:8100"
    - "10.0.0.2:8100"
  tp_size: 8
  dp_size: 2
  world_size_per_node: 8

decode:
  nodes:
    - "10.0.0.3:8200"
    - "10.0.0.4:8200"
  tp_size: 1
  dp_size: 16
  world_size_per_node: 8

scheduling: loadbalanced

Start the proxy:

xpyd --config proxy.yaml
# or
python core/MicroPDProxyServer.py --config proxy.yaml

The proxy also searches for config in this order:

  1. --config / -c CLI argument
  2. XPYD_CONFIG environment variable
  3. ./xpyd.yaml in the current directory

Startup Node Discovery

The proxy starts listening immediately but returns 503 on business endpoints (/v1/completions, /v1/chat/completions) until at least 1 prefill + 1 decode node respond healthy. Health/status/metrics endpoints are always available.

Configure in YAML:

startup:
  wait_timeout_seconds: 600   # exit if nodes not ready after 10 min
  probe_interval_seconds: 10  # probe /health every 10s

The topology parameters expand into instance addresses automatically:

  • Prefill: 2 nodes × (8 / 8) = 1 instance/node = 2 instances
  • Decode: 2 nodes × (8 / 1) = 8 instances/node = 16 instances

A simple flat-list format is also supported (see examples/proxy-simple.yaml):

model: /path/to/model
prefill:
  - "10.0.0.1:8100"
decode:
  - "10.0.0.2:8200"
  - "10.0.0.3:8200"

Option 2: CLI Arguments

python core/MicroPDProxyServer.py \
  --model /path/to/model \
  --prefill 10.0.0.1:8100 10.0.0.2:8100 \
  --decode 10.0.0.3:8200 10.0.0.4:8200 \
  --port 8868 \
  --roundrobin

Option 3: Parameterized Shell Script

For topology-driven deployments with TP/DP parameters:

bash core/xpyd_start_proxy.sh \
  --model /path/to/model \
  --prefill-nodes 2 --prefill-tp-size 8 --prefill-dp-size 2 --prefill-world-size-per-node 8 \
  --decode-nodes 2 --decode-tp-size 1 --decode-dp-size 16 --decode-world-size-per-node 8 \
  --prefill-base-port 8100 --decode-base-port 8200

CLI Arguments Reference

Argument Short Default Description
--config -c Path to YAML configuration file
--model -m Model name / path (required unless in YAML)
--prefill -p Prefill node URLs (host:port)
--decode -d Decode node URLs (host:port)
--port 8000 Proxy listen port
--roundrobin false Use round-robin scheduling
--generator_on_p_node false Generate first token on prefill node

When both --config and CLI arguments are provided, CLI arguments take precedence.

YAML Config Fields

Field Type Default Description
model string Model name / path (required)
port int 8000 Proxy listen port
log_level string warning Log level: debug, info, warning, error
prefill list or topology [] Prefill node config
decode list or topology Decode node config (required)
scheduling string loadbalanced Scheduling policy: roundrobin, loadbalanced
generator_on_p_node bool false Generate first token on prefill node
admin_api_key string Admin API key (env ADMIN_API_KEY overrides)
openai_api_key string OpenAI API key (env OPENAI_API_KEY overrides)

Docker Deployment

# Build and run the full local topology (2 prefill + 2 decode + proxy)
docker compose up --build

# Or run just the proxy against existing GPU nodes
docker build -t microxpyd .
docker run -p 8868:8868 microxpyd \
  python3 core/MicroPDProxyServer.py \
  --model tokenizers/DeepSeek-R1 \
  --prefill 10.0.0.1:8100 --decode 10.0.0.3:8200 \
  --port 8868

See docs/deployment.md for production deployment details.

Benchmark

Use vLLM's benchmark tool to test proxy throughput:

python -m vllm bench serve \
  --base-url http://localhost:8868 \
  --model DeepSeek-R1 \
  --dataset-name sonnet \
  --sonnet-input-len 1024 \
  --sonnet-output-len 128 \
  --num-prompts 100 \
  --request-rate 10

Configuration

Environment Variable Default Description
PREFILL_DELAY_PER_TOKEN 0.001 Simulated per-prompt-token prefill latency (seconds)
DECODE_DELAY_PER_TOKEN 0.01 Simulated per-decode-token generation latency (seconds)
ADMIN_API_KEY API key for admin endpoints (overrides YAML)
OPENAI_API_KEY Bearer token for backend nodes (overrides YAML)

Running Tests

pip install -r requirements.txt

# Run the full test suite
PYTHONPATH=core:tests python -m pytest tests/ -v

# Run specific test groups
PYTHONPATH=core:tests python -m pytest tests/test_prefill_node.py tests/test_decode_node.py -v  # Node tests
PYTHONPATH=core:tests python -m pytest tests/test_proxy_matrix.py -v                            # Topology matrix
PYTHONPATH=core:tests python -m pytest tests/test_yaml_integration.py -v                        # YAML config integration
PYTHONPATH=core:tests python -m pytest tests/test_config.py tests/test_yaml_config.py -v        # Config validation
PYTHONPATH=core:tests python -m pytest tests/test_topology.py -v                                # Topology expansion
PYTHONPATH=core:tests python -m pytest tests/test_scheduler.py -v                               # Scheduler unit tests
PYTHONPATH=core:tests python -m pytest tests/test_metrics.py -v                                 # Prometheus metrics

Documentation

Document Description
Architecture System architecture overview
API Reference HTTP API endpoints
Configuration YAML config file reference
CLI xpyd command-line tool (planned)
Scheduling Load balancing strategies
Resilience Health checks, circuit breakers, retry (planned)
Metrics Prometheus metrics endpoint
Deployment Deployment and Docker guide
Quick Start Terminal-by-terminal setup
One-Click Setup Quick dummy environment
Proxy Script xpyd_start_proxy.sh usage
Contributing Contribution guidelines

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpyd-1.0.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xpyd-1.0.0-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file xpyd-1.0.0.tar.gz.

File metadata

  • Download URL: xpyd-1.0.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xpyd-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d8baf52c5d06e31f167eed1cfad8561f18348f5ec73aca168e2a8ff52c7501c2
MD5 65fb0bdcd7d040c95454de8c2b8c28fc
BLAKE2b-256 714de7b39b36fb06c2d96fec5bbe5284f482e3f55cf72afa07eb74270a0a2e0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for xpyd-1.0.0.tar.gz:

Publisher: release.yml on xPyD-hub/xPyD-proxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xpyd-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: xpyd-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xpyd-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 433aef3d811cfdd5ede700430b92c729191a4d52bd22a3038c6fee4dd30a5c50
MD5 58e16de011fe8fd8303704df1b6b26fb
BLAKE2b-256 e0fc1b374b64c4498404f9c2b35889fd9adedac60d1bbc1ba0e4ec1319499188

See more details on using hashes here.

Provenance

The following attestation bundles were made for xpyd-1.0.0-py3-none-any.whl:

Publisher: release.yml on xPyD-hub/xPyD-proxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page