轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。
Project description
OSC-LLM
轻量级大模型推理工具,专注于模型推理延迟。
特性
🚀 高性能推理
- CUDA Graph: 编译优化,减少推理延迟
- PagedAttention: 高效的KV缓存管理,支持长序列推理
- 连续批处理: 支持动态批量推理优化
🛠️ 易用性
- 轻量级设计: 专注于推理性能,减少依赖
- 简单API: 简洁的Python接口
- 模型管理: 内置下载和管理工具
安装
- 安装最新版本pytorch
- 安装flash-attn: 建议下载官方构建好的whl包,避免编译问题
- 安装osc-llm
pip install osc-llm --upgrade
快速开始
下载模型
llm download Qwen/Qwen3-0.6B
基本使用
from osc_llm import Qwen3ForCausalLM
# 初始化模型
llm = Qwen3ForCausalLM("checkpoints/Qwen/Qwen3-0.6B")
llm.setup(device="cuda:0", gpu_memory_utilization=0.9)
# 对话
chat_template = llm.get_chat_template()
chat_template.add_user_message("介绍一下北京")
prompt = chat_template.apply(enable_thinking=True)
assistant_content = llm.generate(prompts=[prompt])[0]
chat_template.add_assistant_message(assistant_content)
print(chat_template.messages)
流式生成
chat_template = llm.get_chat_template()
chat_template.add_user_message("介绍一下北京")
prompt = chat_template.apply(enable_thinking=True)
for token in llm.stream(prompt=prompt):
print(token, end="", flush=True)
支持的模型
- Qwen3ForCausalLM
- Qwen2ForCausalLM
CLI 工具
llm download <repo_id> [--endpoint hf-mirror|modelscope]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
osc_llm-0.2.2.tar.gz
(11.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
osc_llm-0.2.2-py3-none-any.whl
(16.3 kB
view details)
File details
Details for the file osc_llm-0.2.2.tar.gz.
File metadata
- Download URL: osc_llm-0.2.2.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94cd8042d1eac8fc6ee9c8aab32b53afb1160bf09d461777269efd191cfe28d5
|
|
| MD5 |
b5a925fffe376113f7df7a76254a584c
|
|
| BLAKE2b-256 |
5f85376c5c606f4c42a9d1bf14d887a333f7de8883f3fbf1506ff3c948c85a2b
|
File details
Details for the file osc_llm-0.2.2-py3-none-any.whl.
File metadata
- Download URL: osc_llm-0.2.2-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc09de42921645c6e67ac3351012d824844041ed94cacd1615f8c4ee10808fde
|
|
| MD5 |
05b102cc602dc0e2540b5db69feabc01
|
|
| BLAKE2b-256 |
13d647dd758c03f4161f066b65b5e700119e1623d643b78a8e22c22f9489ffb7
|