6 projects
vllm-swift
vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon
longctx-svc
Local retrieval companion for inference servers — scoped, session-aware, file-watching.
longctx
Open long-context inference stack: retrieval + open weights, no closed parts.
tqkit
Unified toolkit for benchmarking and integrating TurboQuant+ KV-cache compression across inference engines (llama.cpp, vLLM, MLX).
refract-llm
REFRACT — Reference-anchored Robust Acid-test for Compressed Transformers. Multi-axis KV-cache fidelity scoring for LLMs across llama.cpp, MLX, vLLM, and SGLang.
usbinfo
Module for introspecting USB devices on a system