DeepLink Inference Extension

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3.10
- Python :: 3.11

Project description

介绍

dlinfer提供了一套将国产硬件接入大模型推理框架的解决方案。对上承接大模型推理框架，对下在eager模式下调用各厂商的融合算子，在graph模式下调用厂商的图引擎。在dlinfer中，我们根据主流大模型推理框架与主流硬件厂商的融合算子粒度，定义了大模型推理的融合算子接口。

这套融合算子接口主要功能：

将对接框架与对接厂商融合算子在适配工程中有效解耦；
同时支持算子模式和图模式；
图模式下的图获取更加精确匹配，提高最终端到端性能；
同时支持LLM推理和VLM推理。

目前，我们正在全力支持LMDeploy适配国产芯片，包括华为，沐曦，寒武纪等。

架构介绍

组件介绍

op interface：大模型推理算子接口，对齐了主流推理框架以及各个厂商的融合算子粒度。
- 算子模式：在pytorch的eager模式下，我们将通过op interface向下分发到厂商kernel。由于各个厂商对于参数的数据排布有不同的偏好，所以在这里我们并不会规定数据排布，但是为了多硬件的统一适配，我们将会统一参数的维度信息。
- 图模式：在极致性能的驱动下，在一些硬件上的推理场景中需要依靠图模式。我们利用Pytorch2中的Dynamo编译路线，通过统一的大模型推理算子接口，获取较为粗粒度算子的计算图，并将计算图通过IR转换后提供给硬件厂商的图编译器。
framework adaptor：将大模型推理算子接口加入推理框架中，并且对齐算子接口的参数。
kernel adaptor：吸收了大模型推理算子接口参数和硬件厂商融合算子参数间的差异。

安装方法

各平台镜像地址

Atlas 800T A3: docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a3-latest （Atlas 800T A3目前只支持Qwen系列的算子模式下运行）
Atlas 800T A2: docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest
Atlas 300I Duo: docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:300i-duo-latest （Atlas 300I Duo目前只支持非eager模式）
沐曦C500 docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest
寒武纪云端加速卡 docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest

pip安装

pip install dlinfer-ascend

目前只有华为的Atlas 800T A2与300I Duo支持pip安装。其他硬件请使用源码安装。

源码安装

华为Atlas 800T A2/A3/300I Duo

在Atlas 800T A2上依赖torch和torch_npu，运行以下命令安装torch、torch_npu及其依赖。
```
pip3 install -r requirements/ascend/full.txt
```

完成上述准备工作后，使用如下命令即可安装dlinfer。

cd /path_to_dlinfer
# 默认不编译 dicp/AtbGraph
DEVICE=ascend python3 setup.py develop

# 若需要编译 dicp（例如使用 atbgraph 后端），请显式开启：
DLINFER_BUILD_DICP=ON DEVICE=ascend python3 setup.py develop

沐曦C500

沐曦软件栈请自行联系沐曦相关人员。

沐曦版本的dlinfer安装命令如下：

cd /path_to_dlinfer
DEVICE=maca python3 setup.py develop

寒武纪云端智能加速卡

寒武纪软件栈请自行联系寒武纪相关人员。

寒武纪版本的dlinfer安装命令如下：

cd /path_to_dlinfer
DEVICE=camb python3 setup.py develop

支持模型框架列表

LMDeploy

			Atlas 800T A2	Atlas 800T A2	Atlas 800T A2	Atlas 800T A2	Atlas 300I Duo	Atlas 800T A3	Maca C500	Cambricon
Model	Size	Type	FP16/BF16(eager)	FP16/BF16(graph)	W8A8(graph)	W4A16(eager)	FP16(graph)	FP16/BF16(eager)	BF/FP16	BF/FP16
Llama2	7B - 70B	LLM	Yes	Yes	Yes	Yes	-	Yes	Yes	Yes
Llama3	8B	LLM	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Llama3.1	8B	LLM	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
InternLM2	7B - 20B	LLM	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
InternLM2.5	7B - 20B	LLM	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
InternLM3	8B	LLM	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Mixtral	8x7B	LLM	Yes	Yes	No	No	Yes	-	Yes	Yes
QWen1.5-MoE	A2.7B	LLM	Yes	-	No	No	-	-	Yes	-
QWen2(.5)	7B	LLM	Yes	Yes	Yes	Yes	Yes	-	Yes	Yes
QWen2-VL	2B, 7B	MLLM	Yes	Yes	-	-	-	-	Yes	No
QWen2.5-VL	3B - 72B	MLLM	Yes	Yes	-	-	Yes	-	Yes	No
QWen2-MoE	A14.57B	LLM	Yes	-	No	No	-	-	Yes	-
QWen3	0.6B-235B	LLM	Yes	Yes	No	No	Yes	Yes	Yes	Yes
DeepSeek-V2	16B	LLM	No	Yes	No	No	-	-	-	-
InternVL(v1.5)	2B-26B	MLLM	Yes	-	Yes	Yes	-	-	Yes	-
InternVL2	1B-40B	MLLM	Yes	Yes	Yes	Yes	Yes	-	Yes	Yes
InternVL2.5	1B-78B	MLLM	Yes	Yes	Yes	Yes	Yes	-	Yes	Yes
InternVL3	1B-78B	MLLM	Yes	Yes	Yes	Yes	Yes	-	Yes	Yes
CogVLM2-chat	19B	MLLM	Yes	No	-	-	-	-	Yes	-
GLM4V	9B	MLLM	Yes	No	-	-	-	-	-	-

‘Yes’代表测试通过，‘No’代表不支持，‘-’代表未测试

使用LMDeploy

LMDeploy安装：

cd /path_to_lmdeploy
# 华为
LMDEPLOY_TARGET_DEVICE=ascend pip3 install -e .
# 沐曦
LMDEPLOY_TARGET_DEVICE=maca   pip3 install -e .
# 寒武纪
LMDEPLOY_TARGET_DEVICE=camb   pip3 install -e .

只需要指定pytorch engine后端为ascend/maca/camb，不需要其他任何修改即可。详细可参考lmdeploy文档。

[!CAUTION] 寒武纪环境下必须把PytorchEnginConfig中的block_size设为16。

示例代码如下：

import lmdeploy
from lmdeploy import PytorchEngineConfig
pipe = lmdeploy.pipeline("/path_to_model",
               backend_config = PytorchEngineConfig(tp=1,
               cache_max_entry_count=0.4, device_type="ascend", eager_mode=True))
question = ["Shanghai is", "Please introduce China", "How are you?"]
response = pipe(question, request_output_len=256, do_preprocess=False)
for idx, r in enumerate(response):
    print(f"Q: {question[idx]}")
    print(f"A: {r.text}")
    print()

[!TIP] 图模式已经支持除了昇腾A3之外的所有硬件。

用户可以在离线模式下设定PytorchEngineConfig中的eager_mode=False来开启图模式，或者设定eager_mode=True来关闭图模式。在线模式下默认开启图模式，请添加--eager-mode来关闭图模式。

[!IMPORTANT] 目前寒武纪加速卡上启动多卡推理需要手动启动ray。

下面是一个2卡的例子：

 export MLU_VISIBLE_DEVICES=0,1
 ray start --head --resources='{"MLU": 2}'

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3.10
- Python :: 3.11

Release history Release notifications | RSS feed

This version

0.2.7

Apr 2, 2026

0.2.6

Feb 5, 2026

0.2.5

Jan 6, 2026

0.2.4

Dec 4, 2025

0.2.3.post2

Nov 4, 2025

0.2.2

Sep 5, 2025

0.2.1.post3

Jul 2, 2025

0.2.1.post2

Jun 13, 2025

0.2.1.post1

Jun 13, 2025

0.2.1

Jun 11, 2025

0.2.0

May 6, 2025

0.1.8

Apr 16, 2025

0.1.7

Mar 24, 2025

0.1.6

Feb 26, 2025

0.1.5

Jan 16, 2025

0.1.4

Dec 27, 2024

0.1.3.post1

Dec 11, 2024

0.1.3

Dec 8, 2024

0.1.2

Nov 16, 2024

0.1.1.post2

Oct 28, 2024

0.1.0.post1

Sep 10, 2024

0.1.0

Aug 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dlinfer_ascend-0.2.7-cp311-cp311-manylinux2014_aarch64.whl (107.2 kB view details)

Uploaded Apr 2, 2026 CPython 3.11

dlinfer_ascend-0.2.7-cp310-cp310-manylinux2014_aarch64.whl (107.2 kB view details)

Uploaded Apr 2, 2026 CPython 3.10

File details

Details for the file dlinfer_ascend-0.2.7-cp311-cp311-manylinux2014_aarch64.whl.

File metadata

Download URL: dlinfer_ascend-0.2.7-cp311-cp311-manylinux2014_aarch64.whl
Upload date: Apr 2, 2026
Size: 107.2 kB
Tags: CPython 3.11
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for dlinfer_ascend-0.2.7-cp311-cp311-manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`665799feb5505517f34045f5908d78232f59c9ee2bb4b443cda356483c450b46`
MD5	`f970959ed4911e45eeff2fbaa10f31ab`
BLAKE2b-256	`c2f8728773f93119242fd44b86687a99e2f3a6b81d160715c8243d314269c05c`

See more details on using hashes here.

File details

Details for the file dlinfer_ascend-0.2.7-cp310-cp310-manylinux2014_aarch64.whl.

File metadata

Download URL: dlinfer_ascend-0.2.7-cp310-cp310-manylinux2014_aarch64.whl
Upload date: Apr 2, 2026
Size: 107.2 kB
Tags: CPython 3.10
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for dlinfer_ascend-0.2.7-cp310-cp310-manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`176d79b6b5a9c78ed35aa304a123241b4c1ca83a840f5bf6fbe97b73b5d5a504`
MD5	`9238e5aad0d63a81a2e4af9483cbaad6`
BLAKE2b-256	`69cf95fea7be55327c2f7c57630663813338867719eba73c2fe9e09850bfff86`

See more details on using hashes here.

dlinfer-ascend 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

介绍

架构介绍

组件介绍

安装方法

各平台镜像地址

pip安装

源码安装

华为Atlas 800T A2/A3/300I Duo

沐曦C500

寒武纪云端智能加速卡

支持模型框架列表

LMDeploy

使用LMDeploy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes