Reusable LLM clients and speculative decode workflows for FlexInfer services
Project description
flexinfer-llm-kit
Reusable LLM client helpers and speculative decoding workflows for FlexInfer Python services.
Features
- OpenAI-compatible client configuration (LiteLLM / vLLM / OpenAI)
- LangChain
ChatOpenAIfactories for common workloads - Speculative decode workflow (draft → verify → revise) via LangGraph
Installation
From PyPI:
pip install flexinfer-llm-kit
From GitLab:
pip install git+https://gitlab.flexinfer.ai/libs/py-llm-kit.git
Usage
Speculative decode
from llm_kit.spec_decode import spec_decode
result = await spec_decode("Write 3 storyboard panel prompts in JSON.")
Model factories
from llm_kit.clients import get_textgen_model
model = get_textgen_model()
response = model.invoke([{"role": "user", "content": "Hello"}])
print(response.content)
Configuration
| Env Var | Description | Default |
|---|---|---|
LLM_BASE_URL |
OpenAI-compatible base URL (ex: http://litellm.../v1) |
http://litellm.ai.svc:8000/v1 |
LLM_API_KEY |
API key for the endpoint | sk-local |
LLM_TEXTGEN_MODEL |
LiteLLM model id/alias | textgen |
LLM_AGENT_MODEL |
LiteLLM model id/alias | agent |
LLM_VISION_MODEL |
LiteLLM model id/alias | vision |
License
MIT
Publishing (GitLab PyPI)
- Bump
versioninpyproject.toml. - Tag and push:
git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0
- In GitLab CI for
libs/py-llm-kit, run the manualpublishjob for that tag pipeline.
Publishing (Public PyPI)
- Set GitLab CI variables:
PYPI_API_TOKENandPUBLISH_PUBLIC_PYPI=true. - Run the manual
publish:pypijob for the tag pipeline.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flexinfer_llm_kit-0.2.1-py3-none-any.whl.
File metadata
- Download URL: flexinfer_llm_kit-0.2.1-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
437cfc3eafd2149dac74e72f6fc13e2f251088e2b33f6c598012ff149caa9321
|
|
| MD5 |
8267a912115f6f1794631c12c8601225
|
|
| BLAKE2b-256 |
531c649866ff44dc46d139fe4aeaa54f7171669badf370a913c17cf8984fa675
|