High-performance, vendor-agnostic LLM inference library
Project description
Pyxis
High-performance, vendor-agnostic LLM inference library.
Status snapshot (2026-02-21)
- Sprints 1-5 are completed (
docs/SPRINT_CHECKLIST.md). - Core worker now supports pluggable executor backends (
hf,echo).
Quick start (local, 3 processes)
-
Install deps (in a venv):
pip install -e .
Model downloads follow your local HuggingFace/Transformers cache settings.
-
Start the core worker:
python scripts/run_core.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --backend hf
-
Start the tokenizer + detokenizer worker:
python scripts/run_tokenizer.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0
-
Start the HTTP API:
python scripts/run_api.py --host 127.0.0.1 --port 8000
-
Verify streaming:
python scripts/verify_api_real.py
Installed CLI commands are also available:
pyxislm-corepyxislm-tokenizerpyxislm-api
List executor backends:
python -m pyxis.cli.core --list-backends
Useful environment variables
PYXIS_MODEL_PATH: model name/path forscripts/run_core.py(defaults toTinyLlama/TinyLlama-1.1B-Chat-v1.0)PYXIS_MODEL_BACKEND: executor backend for core worker (hfdefault,echobuilt-in)PYXIS_TOKENIZER_PATH: tokenizer name/path forscripts/run_tokenizer.py(defaults toTinyLlama/TinyLlama-1.1B-Chat-v1.0)PYXIS_TOKENIZER_INGRESS: API → tokenizer IPC address overridePYXIS_DETOK_TO_API: detok → API IPC address overridePYXIS_CORE_REQUEST_QUEUE_SIZE: max queued generation requests inside core worker (default1024)PYXIS_MAX_INFLIGHT_REQUESTS: max concurrent API streaming requests before429 overloaded(default128)PYXIS_PER_REQUEST_QUEUE_MAXSIZE: per-request detok queue bound in API (default128)PYXIS_STREAM_IDLE_TIMEOUT_S: stream idle timeout before API emits an error chunk (default30)PYXIS_TOKENIZER_READY_WAIT_S: wait for tokenizer ingress readiness per enqueue (default1.0)
Smoke / integration scripts
python scripts/verify_ingestion.py: API-less ingestion test (tokenizer → core request)python scripts/verify_api_ingress.py: API streaming test with a mocked corepython scripts/verify_api_real.py: API streaming test with real core+tokenizer
Realtime usage
- Interactive chat REPL:
python scripts/chat_repl.py
- End-to-end realtime harness (starts services, checks streaming/cancel/backpressure):
powershell -ExecutionPolicy Bypass -File scripts/test_realtime.ps1 -SkipInstall
Developer workflow
- Quick checks:
python scripts/dev.py test-quick - Full checks:
python scripts/dev.py test-all - Optional lint/type checks:
python scripts/dev.py lint
Contributor docs:
CONTRIBUTING.mddocs/HACKING.md
Benchmark harness:
python benchmarks/api_stream_bench.py --requests 20 --concurrency 4- See
benchmarks/README.md
Architecture (high level)
HTTP API → TokenizerWorker → CoreWorker → TokenizerWorker (detok) → HTTP streaming response
POST /v1/chat/completions streams SSE (text/event-stream) with OpenAI-style chat.completion.chunk payloads and a final [DONE].
GET /health includes readiness and API stage latency snapshots (stage_latency_ms).
See docs/ARCHITECTURE.md for details.
Session notes and recent implementation memory are tracked in docs/MEMORY.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyxislm-0.1.1.tar.gz.
File metadata
- Download URL: pyxislm-0.1.1.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a69b29e84e49d2e87f660efd6fbd7bb1bf7aec990a8fef44360341a7adf91823
|
|
| MD5 |
d27ad93ee6e26f40b8b19530140f949a
|
|
| BLAKE2b-256 |
e29414d32beee6d6dcde759b1ecdc26e033a88d3ad124ab0e39964378c8d003e
|
File details
Details for the file pyxislm-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyxislm-0.1.1-py3-none-any.whl
- Upload date:
- Size: 28.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10ffd853c4217b7c9d13a910e212eb87bff23b651ad5706a60b996dd42544f35
|
|
| MD5 |
19853ef4c845427b6b61d0efca679071
|
|
| BLAKE2b-256 |
03e2800224de1b618d049ac1bcb59889a27d99ce0ad4c971f2cfbba442a08a19
|