Skip to main content

轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。

Project description

OSC-LLM

A lightweight LLM inference toolkit focused on minimizing inference latency.

Chinese README

Features

  • CUDA Graph: Compilation optimizations that reduce inference latency
  • PagedAttention: Efficient KV-cache management enabling long-sequence inference
  • Continuous batching: Supports dynamic batch inference optimization

Installation

  • Install the latest PyTorch
  • Install flash-attn: recommended to use the official prebuilt wheel to avoid build issues
  • Install osc-llm
pip install osc-llm --upgrade

Quick Start

Basic Usage

from osc_llm import LLM, SamplingParams

# Initialize the model
llm = LLM("checkpoints/Qwen/Qwen3-0.6B", gpu_memory_utilization=0.5, device="cuda:0")

# Chat
messages = [
    {"role": "user", "content": "Hello! What's your name?"}
]
sampling_params = SamplingParams(temperature=0.5, top_p=0.95, top_k=40)
result = llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=False)
print(result)

# Streaming generation
for token in llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=True):
    print(token, end="", flush=True)

Supported Models

  • Qwen3ForCausalLM
  • Qwen2ForCausalLM

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_llm-0.2.4.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osc_llm-0.2.4-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file osc_llm-0.2.4.tar.gz.

File metadata

  • Download URL: osc_llm-0.2.4.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.4.tar.gz
Algorithm Hash digest
SHA256 965bea7def2b96fae087eba04b634ce6ce73f455d20ebaf8e1ce86334512ae4d
MD5 ba382fa455133423d0b35378e00d03fc
BLAKE2b-256 cdc05b230d1e147d7a2772c6fc49e24f05e3b01d743bef9e3a494bd90a6ed0a8

See more details on using hashes here.

File details

Details for the file osc_llm-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: osc_llm-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for osc_llm-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8586e25cdc20dcf6bacfbe357f624a8b304f21da39651ec29b2001a06902287f
MD5 6192a258e6d2bbc5c126c66595962f44
BLAKE2b-256 035816bdbaba5aeb28390d6c88db687d6bf46bac452901cca065817438311361

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page