轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。
Project description
OSC-LLM
A lightweight LLM inference toolkit focused on minimizing inference latency.
Features
- CUDA Graph: Compilation optimizations that reduce inference latency
- PagedAttention: Efficient KV-cache management enabling long-sequence inference
- Continuous batching: Supports dynamic batch inference optimization
Installation
- Install the latest PyTorch
- Install flash-attn: recommended to use the official prebuilt wheel to avoid build issues
- Install osc-llm
pip install osc-llm --upgrade
Quick Start
Basic Usage
from osc_llm import LLM, SamplingParams
# Initialize the model
llm = LLM("checkpoints/Qwen/Qwen3-0.6B", gpu_memory_utilization=0.5, device="cuda:0")
# Chat
messages = [
{"role": "user", "content": "Hello! What's your name?"}
]
sampling_params = SamplingParams(temperature=0.5, top_p=0.95, top_k=40)
result = llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=False)
print(result)
# Streaming generation
for token in llm.chat(messages=messages, sampling_params=sampling_params, enable_thinking=True, stream=True):
print(token, end="", flush=True)
Supported Models
- Qwen3ForCausalLM
- Qwen2ForCausalLM
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
osc_llm-0.2.5.tar.gz
(12.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
osc_llm-0.2.5-py3-none-any.whl
(17.0 kB
view details)
File details
Details for the file osc_llm-0.2.5.tar.gz.
File metadata
- Download URL: osc_llm-0.2.5.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c64e092d9196a6ff486da3a2f64c04a4b92b6fff1d50922ec0a9c679c03dba36
|
|
| MD5 |
20d0239a792b082868e637329526b10a
|
|
| BLAKE2b-256 |
458560860cca97c352ae269b3ae2f1a9b691900f9b0a86ab7854e22502815873
|
File details
Details for the file osc_llm-0.2.5-py3-none-any.whl.
File metadata
- Download URL: osc_llm-0.2.5-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b78b0513b267cf45caeacaaed432466ab6dc3363b497cfb781da0f3a92698fd
|
|
| MD5 |
1b715a67fc0eabc2718220c9f02495ed
|
|
| BLAKE2b-256 |
55d1e0b37188bc4ee71bb456d6f754588ee02dc09f79de491032c5346fd5e709
|