A lightweight LLM inference framework
Project description
light-llm-hp - 轻量级 LLM 推理框架
在 CPU 上运行的简化推理框架,支持 REST API 服务。
🚀 Apple Silicon 优化: 支持 MLX 后端,比 PyTorch MPS 快 2-5 倍
快速开始
from hllm import HLLM
# 自动选择最佳后端 (Apple Silicon 上自动使用 MLX)
model = HLLM(model_path="microsoft/Phi-3-mini-4k-instruct")
# 生成文本
result = model.generate("Write a short story about a robot.")
print(result)
Apple Silicon 优化 (MLX)
在 M1/M2/M3 Mac 上,使用 MLX 后端可获得最佳性能:
# 安装 MLX 支持
pip install light-llm-hp[mlx]
from hllm import HLLM
# 显式使用 MLX 后端 (推荐)
model = HLLM(model_path="mlx-community/Llama-3.2-1B-Instruct-4bit", backend="mlx")
# 或使用 PyTorch MPS
model = HLLM(model_path="microsoft/Phi-3-mini-4k-instruct", backend="pytorch", device="mps")
# 查看后端信息
print(model.get_info())
# {'name': 'mlx', 'device': 'mlx', ...}
性能对比
在 M1 MacBook Pro 上的典型性能 (Llama-3.2-1B, 100 tokens):
| 后端 | 首 token 延迟 | 吞吐量 | 内存占用 |
|---|---|---|---|
| MLX | ~50ms | ~45 tok/s | ~800MB |
| PyTorch MPS | ~150ms | ~15 tok/s | ~1200MB |
| PyTorch CPU | ~500ms | ~5 tok/s | ~1200MB |
运行基准测试:
python examples/benchmark.py
REST API 服务 (OpenAI 兼容)
安装 API 依赖
pip install light-llm-hp[api]
启动服务
python -m hllm.server --model ./TinyLlama-1.1B-Chat-v1.0 --port 8000
使用 OpenAI 官方客户端
import httpx
from openai import OpenAI
# 禁用代理避免 502 错误
http_client = httpx.Client(trust_env=False)
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed",
http_client=http_client
)
# 对话
response = client.chat.completions.create(
model="hllm-model",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
完整示例:examples/test_openai_client.py
OpenAI 兼容端点
| 端点 | 方法 | 说明 |
|---|---|---|
/v1/models |
GET | 模型列表 |
/v1/chat/completions |
POST | 对话补全 (支持流式) |
/v1/completions |
POST | 文本补全 (支持流式) |
详细 API 文档见 docs/api.md。
目录结构
hllm/
├── hllm/ # 核心模块
│ ├── __init__.py
│ ├── model.py # 模型加载与推理
│ ├── tokenizer.py # 分词器封装
│ ├── generate.py # 生成逻辑
│ ├── server.py # REST API 服务端
│ └── client.py # REST API 客户端
├── tests/ # 测试
├── examples/ # 示例
└── docs/ # 文档
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
light_llm_hp-0.4.4.tar.gz
(26.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file light_llm_hp-0.4.4.tar.gz.
File metadata
- Download URL: light_llm_hp-0.4.4.tar.gz
- Upload date:
- Size: 26.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
049510486e627ca01d529ee50ebb7b0735f260e089e32773594beee54ce0a94a
|
|
| MD5 |
b29083281109113bb65f1001689904df
|
|
| BLAKE2b-256 |
fd8b70d40537415c64fb161dfc0e94a2188878ec41009838c548ef45940f284c
|
File details
Details for the file light_llm_hp-0.4.4-py3-none-any.whl.
File metadata
- Download URL: light_llm_hp-0.4.4-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60ece0e95f5503f592f5e93374b430bf0946f096178bbac19ed3a4348dad6232
|
|
| MD5 |
d9d56059724a7b3b911ac552e8384c85
|
|
| BLAKE2b-256 |
579c5a251892e35b464870ad2a12c2884591ab6a6bd4a789e6757d902c3a5fb3
|