Client for the vLLM API with minimal dependencies
Project description
vLLM Client
Overview
Client for the vLLM API with minimal dependencies.
Examples
See example.py for the following:
- Single generation
- Streaming
- Batch inference
It should work out of the box with a vLLM API server running a Llama-2 model (any parameter count).
Notes
sampling_params.py
is a copy of the file with the same name from the vLLM repository. It needs to be kept in sync.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vllm_client-0.1.7.tar.gz
(9.9 kB
view hashes)
Built Distributions
Close
Hashes for vllm_client-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fdda6d75d8f95b0dd703458b9169fba04815ff6e4819a5244fb1c0a5d3df8cc |
|
MD5 | 9638222708b66e3e9d52067eaffb85c5 |
|
BLAKE2b-256 | 3999b2b151229f2f64c26df4833a03caafa4a8705dbb64ade3244473292c0a2f |
Close
Hashes for vllm_client-0.1.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90644ae1f1fe595cd7c763d00cfffb2857f75c7b490bfacdcce5a1bf463b7454 |
|
MD5 | c6fbc29d77be050761116dee04cef970 |
|
BLAKE2b-256 | 2938d64ce9b6493789b18e0c5401d19f0d095f5a4c4794ceee295c8a771a6427 |