Skip to main content

轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。

Project description

OSC-LLM

A lightweight LLM inference toolkit focused on minimizing inference latency.

Chinese README

Features

  • CUDA Graph: Compilation optimizations that reduce inference latency
  • PagedAttention: Efficient KV-cache management enabling long-sequence inference
  • Continuous batching: Supports dynamic batch inference optimization

Installation

  • Install the latest PyTorch
  • Install flash-attn: recommended to use the official prebuilt wheel to avoid build issues
  • Install osc-llm
pip install osc-llm --upgrade

Quick Start

Basic Usage

from osc_llm import LLM, SamplingParams

# Initialize the model
llm = LLM("checkpoints/Qwen/Qwen3-0.6B", gpu_memory_utilization=0.5, device="cuda:0")

# Chat
messages = [
    {"role": "user", "content": "Hello! What's your name?"}
]
sampling_params = SamplingParams(temperature=0.5, top_p=0.95, top_k=40)
result = llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=False)
print(result)

# Streaming generation
for token in llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=True):
    print(token, end="", flush=True)

Supported Models

  • Qwen3ForCausalLM
  • Qwen2ForCausalLM

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_llm-0.2.5.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osc_llm-0.2.5-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file osc_llm-0.2.5.tar.gz.

File metadata

  • Download URL: osc_llm-0.2.5.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.5.tar.gz
Algorithm Hash digest
SHA256 c64e092d9196a6ff486da3a2f64c04a4b92b6fff1d50922ec0a9c679c03dba36
MD5 20d0239a792b082868e637329526b10a
BLAKE2b-256 458560860cca97c352ae269b3ae2f1a9b691900f9b0a86ab7854e22502815873

See more details on using hashes here.

File details

Details for the file osc_llm-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: osc_llm-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9b78b0513b267cf45caeacaaed432466ab6dc3363b497cfb781da0f3a92698fd
MD5 1b715a67fc0eabc2718220c9f02495ed
BLAKE2b-256 55d1e0b37188bc4ee71bb456d6f754588ee02dc09f79de491032c5346fd5e709

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page