轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。
Project description
OSC-LLM
轻量级大模型推理工具,专注于模型推理延迟。
特性
- CUDA Graph: 编译优化,减少推理延迟
- PagedAttention: 高效的KV缓存管理,支持长序列推理
- 连续批处理: 支持动态批量推理优化
安装
- 安装最新版本pytorch
- 安装flash-attn: 建议下载官方构建好的whl包,避免编译问题
- 安装osc-llm
pip install osc-llm --upgrade
快速开始
基本使用
from osc_llm import LLM, SamplingParams
# 初始化模型
llm = LLM("checkpoints/Qwen/Qwen3-0.6B", gpu_memory_utilization=0.5, device="cuda:0")
# 对话
messages = [
{"role": "user", "content": "你好啊,你叫什么?"}
]
sampling_params = SamplingParams(temperature=0.5, top_p=0.95, top_k=40)
result = llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=False)
print(result)
# 流式生成
for token in llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=True):
print(token, end="", flush=True)
支持的模型
- Qwen3ForCausalLM
- Qwen2ForCausalLM
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
osc_llm-0.2.3.tar.gz
(11.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
osc_llm-0.2.3-py3-none-any.whl
(16.9 kB
view details)
File details
Details for the file osc_llm-0.2.3.tar.gz.
File metadata
- Download URL: osc_llm-0.2.3.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
388f22b2d5ea8f7eb3c86982c396671214c08005f7df44c90ffa2c186f5bc580
|
|
| MD5 |
e41cf70b94fb298fc4ac23731eedfe6f
|
|
| BLAKE2b-256 |
f543be1d9b58691666e50f1ddeac6b5e0e6ae364341e3e663a45b7b82df6470f
|
File details
Details for the file osc_llm-0.2.3-py3-none-any.whl.
File metadata
- Download URL: osc_llm-0.2.3-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55417699a98a7d52d494055ed4d2add7f9f6f632b5a95bfaa0a3ed9cbdb94461
|
|
| MD5 |
4d4f0bb93e1f6133185f59636d556c51
|
|
| BLAKE2b-256 |
dfe866e31c2a3aca9b817c64c797e4f09dfb83eb9e48ad03efef3dd24c5b7cfa
|