Skip to main content

轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。

Project description

OSC-LLM

轻量级大模型推理工具,专注于模型推理延迟。

特性

🚀 高性能推理

  • CUDA Graph: 编译优化,减少推理延迟
  • PagedAttention: 高效的KV缓存管理,支持长序列推理
  • 连续批处理: 支持动态批量推理优化

🛠️ 易用性

  • 轻量级设计: 专注于推理性能,减少依赖
  • 简单API: 简洁的Python接口
  • 模型管理: 内置下载和管理工具

安装

pip install osc-llm --upgrade

快速开始

下载模型

llm download Qwen/Qwen3-0.6B

基本使用

from osc_llm import Qwen3ForCausalLM

# 初始化模型
llm = Qwen3ForCausalLM("checkpoints/Qwen/Qwen3-0.6B")
llm.setup(device="cuda:0", gpu_memory_utilization=0.9)

# 对话
chat_template = llm.get_chat_template()
chat_template.add_user_message("介绍一下北京")
prompt = chat_template.apply(enable_thinking=True)
assistant_content = llm.generate(prompts=[prompt])[0]
chat_template.add_assistant_message(assistant_content)
print(chat_template.messages)

流式生成

chat_template = llm.get_chat_template()
chat_template.add_user_message("介绍一下北京")
prompt = chat_template.apply(enable_thinking=True)
for token in llm.stream(prompt=prompt):
    print(token, end="", flush=True)

支持的模型

  • Qwen3ForCausalLM

CLI 工具

llm download <repo_id> [--endpoint hf-mirror|modelscope]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_llm-0.2.1.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osc_llm-0.2.1-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file osc_llm-0.2.1.tar.gz.

File metadata

  • Download URL: osc_llm-0.2.1.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 112e1117d9bdf862ed227dcee476922b8357e8d57892523a3da0f65a5dd429dd
MD5 726d6856d4d408905bb5b9c4b4557eb9
BLAKE2b-256 9d2b1a97bcf5fe2ed50f39547ad0b1c5c501087bc16816366b69107d92a4e104

See more details on using hashes here.

File details

Details for the file osc_llm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: osc_llm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 825eb4cd2f5774664818b5ff0866369e06f15f31ca473a24acb42db984511081
MD5 190b5e254fde9267cf79af0984fa8621
BLAKE2b-256 3141f238967fdc20bfc57a69a80f930f4253500ed552e8fef5908b288a20ad6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page