A lightweight LLM inference framework
Project description
light-llm-hp - 轻量级 LLM 推理框架
在 CPU 上运行的简化推理框架,支持 REST API 服务。
🚀 Apple Silicon 优化: 支持 MLX 后端,比 PyTorch MPS 快 2-5 倍
快速开始
from hllm import HLLM
# 自动选择最佳后端 (Apple Silicon 上自动使用 MLX)
model = HLLM(model_path="microsoft/Phi-3-mini-4k-instruct")
# 生成文本
result = model.generate("Write a short story about a robot.")
print(result)
Apple Silicon 优化 (MLX)
在 M1/M2/M3 Mac 上,使用 MLX 后端可获得最佳性能:
# 安装 MLX 支持
pip install light-llm-hp[mlx]
from hllm import HLLM
# 显式使用 MLX 后端 (推荐)
model = HLLM(model_path="mlx-community/Llama-3.2-1B-Instruct-4bit", backend="mlx")
# 或使用 PyTorch MPS
model = HLLM(model_path="microsoft/Phi-3-mini-4k-instruct", backend="pytorch", device="mps")
# 查看后端信息
print(model.get_info())
# {'name': 'mlx', 'device': 'mlx', ...}
性能对比
在 M1 MacBook Pro 上的典型性能 (Llama-3.2-1B, 100 tokens):
| 后端 | 首 token 延迟 | 吞吐量 | 内存占用 |
|---|---|---|---|
| MLX | ~50ms | ~45 tok/s | ~800MB |
| PyTorch MPS | ~150ms | ~15 tok/s | ~1200MB |
| PyTorch CPU | ~500ms | ~5 tok/s | ~1200MB |
运行基准测试:
python examples/benchmark.py
REST API 服务 (OpenAI 兼容)
安装 API 依赖
pip install light-llm-hp[api]
启动服务
python -m hllm.server --model ./TinyLlama-1.1B-Chat-v1.0 --port 8000
使用 OpenAI 官方客户端
import httpx
from openai import OpenAI
# 禁用代理避免 502 错误
http_client = httpx.Client(trust_env=False)
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed",
http_client=http_client
)
# 对话
response = client.chat.completions.create(
model="hllm-model",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
完整示例:examples/test_openai_client.py
OpenAI 兼容端点
| 端点 | 方法 | 说明 |
|---|---|---|
/v1/models |
GET | 模型列表 |
/v1/chat/completions |
POST | 对话补全 (支持流式) |
/v1/completions |
POST | 文本补全 (支持流式) |
详细 API 文档见 docs/api.md。
目录结构
hllm/
├── hllm/ # 核心模块
│ ├── __init__.py
│ ├── model.py # 模型加载与推理
│ ├── tokenizer.py # 分词器封装
│ ├── generate.py # 生成逻辑
│ ├── server.py # REST API 服务端
│ └── client.py # REST API 客户端
├── tests/ # 测试
├── examples/ # 示例
└── docs/ # 文档
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
light_llm_hp-0.4.1.tar.gz
(19.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file light_llm_hp-0.4.1.tar.gz.
File metadata
- Download URL: light_llm_hp-0.4.1.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
987ab94063d4c846d14206aa1b492e469f2210c18c3dd121538f41605d8ab8a0
|
|
| MD5 |
14795a4f9a3d7bce8956f62e2976e186
|
|
| BLAKE2b-256 |
60d78a880943ab3f49dcea8d88e170e21eb942345c54743fe8ce91a9c3fb85b9
|
File details
Details for the file light_llm_hp-0.4.1-py3-none-any.whl.
File metadata
- Download URL: light_llm_hp-0.4.1-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7010f33a109b890b265236f3cb0e1af7b0c8ea7279320e00963b897310b0d020
|
|
| MD5 |
a89b86a804a7b10f9fca73cfeed7b3ba
|
|
| BLAKE2b-256 |
f0ba1f6bdc42c4d3e49330fb0e2907dd5df635015084e117dae060068c8e0e63
|