轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。
Project description
简介
osc-llm是一款轻量级别的模型推理框架, 专注于易用性和多任务的推理。
特点
- 使用torch.compile减少最多4倍以上的推理时间。
- 使用int8,int4量化减少显存占用。
- 使用Speculative decoding减少推理时间。
文档地址:
安装
- 安装最新版本pytorch
- 安装osc-llm:
pip install osc-llm
快速开始
# 下面以llama3为例演示如何转换为osc-llm格式,并进行聊天。
# 假设你已经下载好huggingface的llama3模型在checkpoints/meta-llama目录下
# 1. 转换
llm convert --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct
# 2. 量化
llm quantize int8 --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct --save_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct-int8
# 3. 聊天(使用编译功能加速推理速度,需要等待几分钟编译时间)
llm chat --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct-int8 --compile true
# 4. 部署
llm serve --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct-int8
模型支持
以下huggingface中的模型结构(查看config.json)已经支持转换为osc-llm格式:
- LlamaForCausalLM: llama2, llama3, chinese-alpaca2等。
- Qwen2ForCausalLM: qwen1.5系列。
- Qwen2MoeForCausalLM: qwen2-moe系列(目前无法完成编译,推理速度很慢)。
致敬
本项目参考了大量的开源项目,特别是以下项目:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
osc_llm-0.1.6.tar.gz
(35.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
osc_llm-0.1.6-py3-none-any.whl
(53.6 kB
view details)
File details
Details for the file osc_llm-0.1.6.tar.gz.
File metadata
- Download URL: osc_llm-0.1.6.tar.gz
- Upload date:
- Size: 35.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa8e63d2ac8d8f5c94116ad590f7dec2fb66450f1600b81eb9f3d41a2c6a374c
|
|
| MD5 |
282fe7e8289c799dbd2b1d19488e50da
|
|
| BLAKE2b-256 |
344fe10bf26354deba5869c602f63cb198c048d6ece574fc022075641f7393b3
|
File details
Details for the file osc_llm-0.1.6-py3-none-any.whl.
File metadata
- Download URL: osc_llm-0.1.6-py3-none-any.whl
- Upload date:
- Size: 53.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d98f2432fb64c313673acc0567dd6a5fa64d834981ada341ec9582690c8031ab
|
|
| MD5 |
5e97f93731ed915441e39b5bbe58a54d
|
|
| BLAKE2b-256 |
a7963b1f6af82df4bf7128449bdbd8da9672b5773addca00101bcea96b3fe20c
|