Skip to main content

gRPC frontend server exposing vLLM's V1 engine over protobuf/gRPC

Project description

vllm-grpc-frontend

The gRPC frontend server for vllm-grpc. It wraps vLLM's V1 AsyncLLM engine and serves the project's protobuf ChatService, CompletionsService, and HealthService over gRPC.

Affiliation: vllm-grpc is an independent, community project and is not affiliated with, endorsed by, or sponsored by the vLLM project. "vLLM" is used here only to identify the inference engine this frontend works with.

Install

pip install vllm-grpc-frontend

The base install pulls no vLLM, so it succeeds on any platform — including those without a vLLM wheel. vLLM is required only to actually run the engine.

vLLM prerequisite (V1 engine)

The frontend drives vLLM's V1 AsyncLLM API, so it requires vllm>=0.20. Install it via the opt-in engine extra:

pip install "vllm-grpc-frontend[engine]"     # pulls vllm>=0.20

Or provide vLLM yourself — useful on platforms where the stock wheel does not build (for example macOS, which uses vllm-metal). If vLLM is missing at runtime, the server raises an ImportError when it starts; the package still installs fine without it.

Run

The package installs a vllm-grpc-frontend console script:

vllm-grpc-frontend        # serves gRPC on 0.0.0.0:50051 by default

Configure with environment variables:

  • MODEL_NAME — model to load (default Qwen/Qwen3-0.6B)
  • FRONTEND_HOST / FRONTEND_PORT — bind address (default 0.0.0.0:50051)

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_grpc_frontend-0.1.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_grpc_frontend-0.1.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file vllm_grpc_frontend-0.1.0.tar.gz.

File metadata

  • Download URL: vllm_grpc_frontend-0.1.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for vllm_grpc_frontend-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3f2c86c58f68f51fc277fffa12d1eb49ecd6fd482d3b04f29dfadf3ad2eed7c1
MD5 d35796607baba4f83121f09271d97069
BLAKE2b-256 246932a936d3178b15fabfaf2642bb3a5fa17acf28f772ca39381a379bc70b3d

See more details on using hashes here.

File details

Details for the file vllm_grpc_frontend-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for vllm_grpc_frontend-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce3afc41f91c803fbce2863f445ce1ea380f010210ea318cbd90cb4e22cfc39e
MD5 92674ee2d8f0a6e66ed2a6d85a3c0e35
BLAKE2b-256 980353d7ff802d3df3cb0e44daed6b3f5f2159258e23e52de43e4bf048ee2dac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page