Lightweight Prefill-Decode proxy for disaggregated LLM serving

Project description

MicroPDProxy

MicroPDProxyServer – a lightweight PD (Prefill-Decode) proxy implementation.

This project provides dummy prefill and decode nodes for local development and debugging of a PD-separated proxy without any GPU or model dependencies.

The dummy nodes expose the minimum compatibility surface required by the validated proxy implementation under core/, including:

/v1/models
/v1/completions
/v1/chat/completions
/health
/ping
/metrics (Prometheus format)

Architecture

MicroPDProxy implements a Prefill-Decode (PD) separated serving architecture. Incoming requests are routed through two phases:

Prefill — sent to a prefill node for KV cache preparation (max_tokens=1, stream=False)
Decode — forwarded to a decode node for autoregressive token generation

The proxy handles scheduling (Round Robin or Load Balanced), health monitoring, and dynamic instance management. See docs/architecture.md for the full architecture overview.

Quick Start

# Install as a CLI tool
pip install .

# Or install in dev mode
pip install -e .

# Start with a YAML config
xpyd --config examples/proxy.yaml

# Or use the traditional way
pip install -r requirements.txt
python core/MicroPDProxyServer.py --config examples/proxy.yaml

Installation

# Install the xpyd CLI
pip install .

# Verify
xpyd --version
xpyd --help

# Validate a config without starting the server
xpyd --validate-config examples/proxy.yaml

Usage

Option 1: YAML Configuration (recommended)

Create a YAML config file (see examples/proxy.yaml):

model: /path/to/model
port: 8868

prefill:
  nodes:
    - "10.0.0.1:8100"
    - "10.0.0.2:8100"
  tp_size: 8
  dp_size: 2
  world_size_per_node: 8

decode:
  nodes:
    - "10.0.0.3:8200"
    - "10.0.0.4:8200"
  tp_size: 1
  dp_size: 16
  world_size_per_node: 8

scheduling: loadbalanced

Start the proxy:

xpyd --config proxy.yaml
# or
python core/MicroPDProxyServer.py --config proxy.yaml

The proxy also searches for config in this order:

--config / -c CLI argument
XPYD_CONFIG environment variable
./xpyd.yaml in the current directory

Startup Node Discovery

The proxy starts listening immediately but returns 503 on business endpoints (/v1/completions, /v1/chat/completions) until at least 1 prefill + 1 decode node respond healthy. Health/status/metrics endpoints are always available.

Configure in YAML:

startup:
  wait_timeout_seconds: 600   # exit if nodes not ready after 10 min
  probe_interval_seconds: 10  # probe /health every 10s

The topology parameters expand into instance addresses automatically:

Prefill: 2 nodes × (8 / 8) = 1 instance/node = 2 instances
Decode: 2 nodes × (8 / 1) = 8 instances/node = 16 instances

A simple flat-list format is also supported (see examples/proxy-simple.yaml):

model: /path/to/model
prefill:
  - "10.0.0.1:8100"
decode:
  - "10.0.0.2:8200"
  - "10.0.0.3:8200"

Option 2: CLI Arguments

python core/MicroPDProxyServer.py \
  --model /path/to/model \
  --prefill 10.0.0.1:8100 10.0.0.2:8100 \
  --decode 10.0.0.3:8200 10.0.0.4:8200 \
  --port 8868 \
  --roundrobin

Option 3: Parameterized Shell Script

For topology-driven deployments with TP/DP parameters:

bash core/xpyd_start_proxy.sh \
  --model /path/to/model \
  --prefill-nodes 2 --prefill-tp-size 8 --prefill-dp-size 2 --prefill-world-size-per-node 8 \
  --decode-nodes 2 --decode-tp-size 1 --decode-dp-size 16 --decode-world-size-per-node 8 \
  --prefill-base-port 8100 --decode-base-port 8200

CLI Arguments Reference

Argument	Short	Default	Description
`--config`	`-c`	—	Path to YAML configuration file
`--model`	`-m`	—	Model name / path (required unless in YAML)
`--prefill`	`-p`	—	Prefill node URLs (host:port)
`--decode`	`-d`	—	Decode node URLs (host:port)
`--port`	—	8000	Proxy listen port
`--roundrobin`	—	false	Use round-robin scheduling
`--generator_on_p_node`	—	false	Generate first token on prefill node

When both --config and CLI arguments are provided, CLI arguments take precedence.

YAML Config Fields

Field	Type	Default	Description
`model`	string	—	Model name / path (required)
`port`	int	8000	Proxy listen port
`log_level`	string	warning	Log level: debug, info, warning, error
`prefill`	list or topology	[]	Prefill node config
`decode`	list or topology	—	Decode node config (required)
`scheduling`	string	loadbalanced	Scheduling policy: roundrobin, loadbalanced
`generator_on_p_node`	bool	false	Generate first token on prefill node
`admin_api_key`	string	—	Admin API key (env `ADMIN_API_KEY` overrides)
`openai_api_key`	string	—	OpenAI API key (env `OPENAI_API_KEY` overrides)

Docker Deployment

# Build and run the full local topology (2 prefill + 2 decode + proxy)
docker compose up --build

# Or run just the proxy against existing GPU nodes
docker build -t microxpyd .
docker run -p 8868:8868 microxpyd \
  python3 core/MicroPDProxyServer.py \
  --model tokenizers/DeepSeek-R1 \
  --prefill 10.0.0.1:8100 --decode 10.0.0.3:8200 \
  --port 8868

See docs/deployment.md for production deployment details.

Benchmark

Use vLLM's benchmark tool to test proxy throughput:

python -m vllm bench serve \
  --base-url http://localhost:8868 \
  --model DeepSeek-R1 \
  --dataset-name sonnet \
  --sonnet-input-len 1024 \
  --sonnet-output-len 128 \
  --num-prompts 100 \
  --request-rate 10

Configuration

Environment Variable	Default	Description
`PREFILL_DELAY_PER_TOKEN`	`0.001`	Simulated per-prompt-token prefill latency (seconds)
`DECODE_DELAY_PER_TOKEN`	`0.01`	Simulated per-decode-token generation latency (seconds)
`ADMIN_API_KEY`	—	API key for admin endpoints (overrides YAML)
`OPENAI_API_KEY`	—	Bearer token for backend nodes (overrides YAML)

Running Tests

pip install -r requirements.txt

# Run the full test suite
PYTHONPATH=core:tests python -m pytest tests/ -v

# Run specific test groups
PYTHONPATH=core:tests python -m pytest tests/test_prefill_node.py tests/test_decode_node.py -v  # Node tests
PYTHONPATH=core:tests python -m pytest tests/test_proxy_matrix.py -v                            # Topology matrix
PYTHONPATH=core:tests python -m pytest tests/test_yaml_integration.py -v                        # YAML config integration
PYTHONPATH=core:tests python -m pytest tests/test_config.py tests/test_yaml_config.py -v        # Config validation
PYTHONPATH=core:tests python -m pytest tests/test_topology.py -v                                # Topology expansion
PYTHONPATH=core:tests python -m pytest tests/test_scheduler.py -v                               # Scheduler unit tests
PYTHONPATH=core:tests python -m pytest tests/test_metrics.py -v                                 # Prometheus metrics

Documentation

Document	Description
Architecture	System architecture overview
API Reference	HTTP API endpoints
Configuration	YAML config file reference
CLI	xpyd command-line tool (planned)
Scheduling	Load balancing strategies
Resilience	Health checks, circuit breakers, retry (planned)
Metrics	Prometheus metrics endpoint
Deployment	Deployment and Docker guide
Quick Start	Terminal-by-terminal setup
One-Click Setup	Quick dummy environment
Proxy Script	xpyd_start_proxy.sh usage
Contributing	Contribution guidelines

Project details

Release history Release notifications | RSS feed

1.4.0

Apr 6, 2026

1.3.0

Apr 6, 2026

1.2.0

Apr 5, 2026

1.1.0

Apr 3, 2026

This version

1.0.0

Apr 1, 2026

0.0.1

Mar 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpyd-1.0.0.tar.gz (38.3 kB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xpyd-1.0.0-py3-none-any.whl (45.4 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file xpyd-1.0.0.tar.gz.

File metadata

Download URL: xpyd-1.0.0.tar.gz
Upload date: Apr 1, 2026
Size: 38.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xpyd-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d8baf52c5d06e31f167eed1cfad8561f18348f5ec73aca168e2a8ff52c7501c2`
MD5	`65fb0bdcd7d040c95454de8c2b8c28fc`
BLAKE2b-256	`714de7b39b36fb06c2d96fec5bbe5284f482e3f55cf72afa07eb74270a0a2e0b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xpyd-1.0.0.tar.gz:

Publisher: release.yml on xPyD-hub/xPyD-proxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xpyd-1.0.0.tar.gz
- Subject digest: d8baf52c5d06e31f167eed1cfad8561f18348f5ec73aca168e2a8ff52c7501c2
- Sigstore transparency entry: 1206601845
- Sigstore integration time: Apr 1, 2026
Source repository:
- Permalink: xPyD-hub/xPyD-proxy@a21ead63ea9a942865debce93e4a253987685406
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/xPyD-hub
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a21ead63ea9a942865debce93e4a253987685406
- Trigger Event: push

File details

Details for the file xpyd-1.0.0-py3-none-any.whl.

File metadata

Download URL: xpyd-1.0.0-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 45.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for xpyd-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`433aef3d811cfdd5ede700430b92c729191a4d52bd22a3038c6fee4dd30a5c50`
MD5	`58e16de011fe8fd8303704df1b6b26fb`
BLAKE2b-256	`e0fc1b374b64c4498404f9c2b35889fd9adedac60d1bbc1ba0e4ec1319499188`

See more details on using hashes here.

Provenance

The following attestation bundles were made for xpyd-1.0.0-py3-none-any.whl:

Publisher: release.yml on xPyD-hub/xPyD-proxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: xpyd-1.0.0-py3-none-any.whl
- Subject digest: 433aef3d811cfdd5ede700430b92c729191a4d52bd22a3038c6fee4dd30a5c50
- Sigstore transparency entry: 1206601849
- Sigstore integration time: Apr 1, 2026
Source repository:
- Permalink: xPyD-hub/xPyD-proxy@a21ead63ea9a942865debce93e4a253987685406
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/xPyD-hub
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a21ead63ea9a942865debce93e4a253987685406
- Trigger Event: push

xpyd 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MicroPDProxy

Architecture

Quick Start

Installation

Usage

Option 1: YAML Configuration (recommended)

Startup Node Discovery

Option 2: CLI Arguments

Option 3: Parameterized Shell Script

CLI Arguments Reference

YAML Config Fields

Docker Deployment

Benchmark

Configuration

Running Tests

Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance