Skip to main content

轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。

Project description

OSC-LLM

轻量级大模型推理工具,专注于模型推理延迟。

特性

🚀 高性能推理

  • CUDA Graph: 编译优化,减少推理延迟
  • PagedAttention: 高效的KV缓存管理,支持长序列推理
  • 连续批处理: 支持动态批量推理优化

🛠️ 易用性

  • 轻量级设计: 专注于推理性能,减少依赖
  • 简单API: 简洁的Python接口
  • 模型管理: 内置下载和管理工具

安装

pip install osc-llm --upgrade

快速开始

下载模型

llm download Qwen/Qwen3-0.6B

基本使用

from osc_llm import Qwen3ForCausalLM

# 初始化模型
llm = Qwen3ForCausalLM("checkpoints/Qwen/Qwen3-0.6B")
llm.setup(device="cuda:0", gpu_memory_utilization=0.9)

# 对话
chat_template = llm.get_chat_template()
chat_template.add_user_message("介绍一下北京")
prompt = chat_template.apply(enable_thinking=True)
assistant_content = llm.generate(prompts=[prompt])[0]
chat_template.add_assistant_message(assistant_content)
print(chat_template.messages)

流式生成

chat_template = llm.get_chat_template()
chat_template.add_user_message("介绍一下北京")
prompt = chat_template.apply(enable_thinking=True)
for token in llm.stream(prompt=prompt):
    print(token, end="", flush=True)

支持的模型

  • Qwen3ForCausalLM
  • Qwen2ForCausalLM

CLI 工具

llm download <repo_id> [--endpoint hf-mirror|modelscope]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_llm-0.2.2.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osc_llm-0.2.2-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file osc_llm-0.2.2.tar.gz.

File metadata

  • Download URL: osc_llm-0.2.2.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.2.tar.gz
Algorithm Hash digest
SHA256 94cd8042d1eac8fc6ee9c8aab32b53afb1160bf09d461777269efd191cfe28d5
MD5 b5a925fffe376113f7df7a76254a584c
BLAKE2b-256 5f85376c5c606f4c42a9d1bf14d887a333f7de8883f3fbf1506ff3c948c85a2b

See more details on using hashes here.

File details

Details for the file osc_llm-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: osc_llm-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fc09de42921645c6e67ac3351012d824844041ed94cacd1615f8c4ee10808fde
MD5 05b102cc602dc0e2540b5db69feabc01
BLAKE2b-256 13d647dd758c03f4161f066b65b5e700119e1623d643b78a8e22c22f9489ffb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page