Skip to main content

轻量级大模型推理工具,专注于模型推理延迟,注重框架易用性和可拓展性。

Project description

OSC-LLM

PyTorch Lightning

简介

osc-llm是一款轻量级别的模型推理框架, 专注于易用性和多任务的推理。

特点

  • 使用torch.compile减少最多4倍以上的推理时间。
  • 使用int8,int4量化减少显存占用。
  • 使用Speculative decoding减少推理时间。

文档地址:

安装

快速开始

# 下面以llama3为例演示如何转换为osc-llm格式,并进行聊天。
# 假设你已经下载好huggingface的llama3模型在checkpoints/meta-llama目录下
# 1. 转换
llm convert --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct
# 2. 量化
llm quantize int8 --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct --save_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct-int8
# 3. 聊天(使用编译功能加速推理速度,需要等待几分钟编译时间)
llm chat --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct-int8 --compile true
# 4. 部署
llm serve --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B-Instruct-int8

模型支持

以下huggingface中的模型结构(查看config.json)已经支持转换为osc-llm格式:

  • LlamaForCausalLM: llama2, llama3, chinese-alpaca2等。
  • Qwen2ForCausalLM: qwen1.5系列。
  • Qwen2MoeForCausalLM: qwen2-moe系列(目前无法完成编译,推理速度很慢)。

致敬

本项目参考了大量的开源项目,特别是以下项目:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_llm-0.1.6.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osc_llm-0.1.6-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file osc_llm-0.1.6.tar.gz.

File metadata

  • Download URL: osc_llm-0.1.6.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for osc_llm-0.1.6.tar.gz
Algorithm Hash digest
SHA256 fa8e63d2ac8d8f5c94116ad590f7dec2fb66450f1600b81eb9f3d41a2c6a374c
MD5 282fe7e8289c799dbd2b1d19488e50da
BLAKE2b-256 344fe10bf26354deba5869c602f63cb198c048d6ece574fc022075641f7393b3

See more details on using hashes here.

File details

Details for the file osc_llm-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: osc_llm-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 53.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for osc_llm-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d98f2432fb64c313673acc0567dd6a5fa64d834981ada341ec9582690c8031ab
MD5 5e97f93731ed915441e39b5bbe58a54d
BLAKE2b-256 a7963b1f6af82df4bf7128449bdbd8da9672b5773addca00101bcea96b3fe20c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page