Lightweight Prefill-Decode proxy for disaggregated LLM serving
Project description
MicroPDProxy
MicroPDProxyServer – a lightweight PD (Prefill-Decode) proxy implementation.
This project provides dummy prefill and decode nodes for local development and debugging of a PD-separated proxy without any GPU or model dependencies.
The dummy nodes expose the minimum compatibility surface required by the
validated proxy implementation under core/, including:
/v1/models/v1/completions/v1/chat/completions/health/ping/metrics(Prometheus format)
Architecture
MicroPDProxy implements a Prefill-Decode (PD) separated serving architecture. Incoming requests are routed through two phases:
- Prefill — sent to a prefill node for KV cache preparation (
max_tokens=1,stream=False) - Decode — forwarded to a decode node for autoregressive token generation
The proxy handles scheduling (Round Robin or Load Balanced), health monitoring,
and dynamic instance management. See docs/architecture.md
for the full architecture overview.
Quick Start
# Install as a CLI tool
pip install .
# Or install in dev mode
pip install -e .
# Start with a YAML config
xpyd --config examples/proxy.yaml
# Or use the traditional way
pip install -r requirements.txt
python core/MicroPDProxyServer.py --config examples/proxy.yaml
Installation
# Install the xpyd CLI
pip install .
# Verify
xpyd --version
xpyd --help
# Validate a config without starting the server
xpyd --validate-config examples/proxy.yaml
Usage
Option 1: YAML Configuration (recommended)
Create a YAML config file (see examples/proxy.yaml):
model: /path/to/model
port: 8868
prefill:
nodes:
- "10.0.0.1:8100"
- "10.0.0.2:8100"
tp_size: 8
dp_size: 2
world_size_per_node: 8
decode:
nodes:
- "10.0.0.3:8200"
- "10.0.0.4:8200"
tp_size: 1
dp_size: 16
world_size_per_node: 8
scheduling: loadbalanced
Start the proxy:
xpyd --config proxy.yaml
# or
python core/MicroPDProxyServer.py --config proxy.yaml
The proxy also searches for config in this order:
--config/-cCLI argumentXPYD_CONFIGenvironment variable./xpyd.yamlin the current directory
Startup Node Discovery
The proxy starts listening immediately but returns 503 on business
endpoints (/v1/completions, /v1/chat/completions) until at least
1 prefill + 1 decode node respond healthy. Health/status/metrics
endpoints are always available.
Configure in YAML:
startup:
wait_timeout_seconds: 600 # exit if nodes not ready after 10 min
probe_interval_seconds: 10 # probe /health every 10s
The topology parameters expand into instance addresses automatically:
- Prefill: 2 nodes × (8 / 8) = 1 instance/node = 2 instances
- Decode: 2 nodes × (8 / 1) = 8 instances/node = 16 instances
A simple flat-list format is also supported (see examples/proxy-simple.yaml):
model: /path/to/model
prefill:
- "10.0.0.1:8100"
decode:
- "10.0.0.2:8200"
- "10.0.0.3:8200"
Option 2: CLI Arguments
python core/MicroPDProxyServer.py \
--model /path/to/model \
--prefill 10.0.0.1:8100 10.0.0.2:8100 \
--decode 10.0.0.3:8200 10.0.0.4:8200 \
--port 8868 \
--roundrobin
Option 3: Parameterized Shell Script
For topology-driven deployments with TP/DP parameters:
bash core/xpyd_start_proxy.sh \
--model /path/to/model \
--prefill-nodes 2 --prefill-tp-size 8 --prefill-dp-size 2 --prefill-world-size-per-node 8 \
--decode-nodes 2 --decode-tp-size 1 --decode-dp-size 16 --decode-world-size-per-node 8 \
--prefill-base-port 8100 --decode-base-port 8200
CLI Arguments Reference
| Argument | Short | Default | Description |
|---|---|---|---|
--config |
-c |
— | Path to YAML configuration file |
--model |
-m |
— | Model name / path (required unless in YAML) |
--prefill |
-p |
— | Prefill node URLs (host:port) |
--decode |
-d |
— | Decode node URLs (host:port) |
--port |
— | 8000 | Proxy listen port |
--roundrobin |
— | false | Use round-robin scheduling |
--generator_on_p_node |
— | false | Generate first token on prefill node |
When both --config and CLI arguments are provided, CLI arguments take precedence.
YAML Config Fields
| Field | Type | Default | Description |
|---|---|---|---|
model |
string | — | Model name / path (required) |
port |
int | 8000 | Proxy listen port |
log_level |
string | warning | Log level: debug, info, warning, error |
prefill |
list or topology | [] | Prefill node config |
decode |
list or topology | — | Decode node config (required) |
scheduling |
string | loadbalanced | Scheduling policy: roundrobin, loadbalanced |
generator_on_p_node |
bool | false | Generate first token on prefill node |
admin_api_key |
string | — | Admin API key (env ADMIN_API_KEY overrides) |
openai_api_key |
string | — | OpenAI API key (env OPENAI_API_KEY overrides) |
Docker Deployment
# Build and run the full local topology (2 prefill + 2 decode + proxy)
docker compose up --build
# Or run just the proxy against existing GPU nodes
docker build -t microxpyd .
docker run -p 8868:8868 microxpyd \
python3 core/MicroPDProxyServer.py \
--model tokenizers/DeepSeek-R1 \
--prefill 10.0.0.1:8100 --decode 10.0.0.3:8200 \
--port 8868
See docs/deployment.md for production deployment details.
Benchmark
Use vLLM's benchmark tool to test proxy throughput:
python -m vllm bench serve \
--base-url http://localhost:8868 \
--model DeepSeek-R1 \
--dataset-name sonnet \
--sonnet-input-len 1024 \
--sonnet-output-len 128 \
--num-prompts 100 \
--request-rate 10
Configuration
| Environment Variable | Default | Description |
|---|---|---|
PREFILL_DELAY_PER_TOKEN |
0.001 |
Simulated per-prompt-token prefill latency (seconds) |
DECODE_DELAY_PER_TOKEN |
0.01 |
Simulated per-decode-token generation latency (seconds) |
ADMIN_API_KEY |
— | API key for admin endpoints (overrides YAML) |
OPENAI_API_KEY |
— | Bearer token for backend nodes (overrides YAML) |
Running Tests
pip install -r requirements.txt
# Run the full test suite
PYTHONPATH=core:tests python -m pytest tests/ -v
# Run specific test groups
PYTHONPATH=core:tests python -m pytest tests/test_prefill_node.py tests/test_decode_node.py -v # Node tests
PYTHONPATH=core:tests python -m pytest tests/test_proxy_matrix.py -v # Topology matrix
PYTHONPATH=core:tests python -m pytest tests/test_yaml_integration.py -v # YAML config integration
PYTHONPATH=core:tests python -m pytest tests/test_config.py tests/test_yaml_config.py -v # Config validation
PYTHONPATH=core:tests python -m pytest tests/test_topology.py -v # Topology expansion
PYTHONPATH=core:tests python -m pytest tests/test_scheduler.py -v # Scheduler unit tests
PYTHONPATH=core:tests python -m pytest tests/test_metrics.py -v # Prometheus metrics
Documentation
| Document | Description |
|---|---|
| Architecture | System architecture overview |
| API Reference | HTTP API endpoints |
| Configuration | YAML config file reference |
| CLI | xpyd command-line tool (planned) |
| Scheduling | Load balancing strategies |
| Resilience | Health checks, circuit breakers, retry (planned) |
| Metrics | Prometheus metrics endpoint |
| Deployment | Deployment and Docker guide |
| Quick Start | Terminal-by-terminal setup |
| One-Click Setup | Quick dummy environment |
| Proxy Script | xpyd_start_proxy.sh usage |
| Contributing | Contribution guidelines |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xpyd-1.0.0.tar.gz.
File metadata
- Download URL: xpyd-1.0.0.tar.gz
- Upload date:
- Size: 38.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8baf52c5d06e31f167eed1cfad8561f18348f5ec73aca168e2a8ff52c7501c2
|
|
| MD5 |
65fb0bdcd7d040c95454de8c2b8c28fc
|
|
| BLAKE2b-256 |
714de7b39b36fb06c2d96fec5bbe5284f482e3f55cf72afa07eb74270a0a2e0b
|
Provenance
The following attestation bundles were made for xpyd-1.0.0.tar.gz:
Publisher:
release.yml on xPyD-hub/xPyD-proxy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xpyd-1.0.0.tar.gz -
Subject digest:
d8baf52c5d06e31f167eed1cfad8561f18348f5ec73aca168e2a8ff52c7501c2 - Sigstore transparency entry: 1206601845
- Sigstore integration time:
-
Permalink:
xPyD-hub/xPyD-proxy@a21ead63ea9a942865debce93e4a253987685406 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/xPyD-hub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a21ead63ea9a942865debce93e4a253987685406 -
Trigger Event:
push
-
Statement type:
File details
Details for the file xpyd-1.0.0-py3-none-any.whl.
File metadata
- Download URL: xpyd-1.0.0-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
433aef3d811cfdd5ede700430b92c729191a4d52bd22a3038c6fee4dd30a5c50
|
|
| MD5 |
58e16de011fe8fd8303704df1b6b26fb
|
|
| BLAKE2b-256 |
e0fc1b374b64c4498404f9c2b35889fd9adedac60d1bbc1ba0e4ec1319499188
|
Provenance
The following attestation bundles were made for xpyd-1.0.0-py3-none-any.whl:
Publisher:
release.yml on xPyD-hub/xPyD-proxy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
xpyd-1.0.0-py3-none-any.whl -
Subject digest:
433aef3d811cfdd5ede700430b92c729191a4d52bd22a3038c6fee4dd30a5c50 - Sigstore transparency entry: 1206601849
- Sigstore integration time:
-
Permalink:
xPyD-hub/xPyD-proxy@a21ead63ea9a942865debce93e4a253987685406 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/xPyD-hub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a21ead63ea9a942865debce93e4a253987685406 -
Trigger Event:
push
-
Statement type: