Skip to main content

轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。

Project description

OSC-LLM

轻量级大模型推理工具,专注于模型推理延迟。

特性

  • CUDA Graph: 编译优化,减少推理延迟
  • PagedAttention: 高效的KV缓存管理,支持长序列推理
  • 连续批处理: 支持动态批量推理优化

安装

pip install osc-llm --upgrade

快速开始

基本使用

from osc_llm import LLM, SamplingParams

# 初始化模型
llm = LLM("checkpoints/Qwen/Qwen3-0.6B", gpu_memory_utilization=0.5, device="cuda:0")

# 对话
messages = [
    {"role": "user", "content": "你好啊,你叫什么?"}
]
sampling_params = SamplingParams(temperature=0.5, top_p=0.95, top_k=40)
result = llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=False)
print(result)

# 流式生成
for token in llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=True):
    print(token, end="", flush=True)

支持的模型

  • Qwen3ForCausalLM
  • Qwen2ForCausalLM

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_llm-0.2.3.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osc_llm-0.2.3-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file osc_llm-0.2.3.tar.gz.

File metadata

  • Download URL: osc_llm-0.2.3.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.3.tar.gz
Algorithm Hash digest
SHA256 388f22b2d5ea8f7eb3c86982c396671214c08005f7df44c90ffa2c186f5bc580
MD5 e41cf70b94fb298fc4ac23731eedfe6f
BLAKE2b-256 f543be1d9b58691666e50f1ddeac6b5e0e6ae364341e3e663a45b7b82df6470f

See more details on using hashes here.

File details

Details for the file osc_llm-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: osc_llm-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 55417699a98a7d52d494055ed4d2add7f9f6f632b5a95bfaa0a3ed9cbdb94461
MD5 4d4f0bb93e1f6133185f59636d556c51
BLAKE2b-256 dfe866e31c2a3aca9b817c64c797e4f09dfb83eb9e48ad03efef3dd24c5b7cfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page