An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application

These details have not been verified by PyPI

Project links

Homepage

Project description

DeepSparse

An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application

A CPU runtime that takes advantage of sparsity within neural networks to reduce compute. Read more about sparsification.

Neural Magic's DeepSparse is able to integrate into popular deep learning libraries (e.g., Hugging Face, Ultralytics) allowing you to leverage DeepSparse for loading and deploying sparse models with ONNX. ONNX gives the flexibility to serve your model in a framework-agnostic environment. Support includes PyTorch, TensorFlow, Keras, and many other frameworks.

Installation

Install DeepSparse Community as follows:

pip install deepsparse

DeepSparse is available in two editions:

DeepSparse Community is open-source and free for evaluation, research, and non-production use with our DeepSparse Community License.
DeepSparse Enterprise requires a Trial License or can be fully licensed for production, commercial applications.

🧰 Hardware Support and System Requirements

To ensure that your CPU is compatible with DeepSparse, it is recommended to review the Supported Hardware for DeepSparse documentation.

To ensure that you get the best performance from DeepSparse, it has been thoroughly tested on Python versions 3.7-3.10, ONNX versions 1.5.0-1.12.0, ONNX opset version 11 or higher, and manylinux compliant systems. It is highly recommended to use a virtual environment when running DeepSparse. Please note that DeepSparse is only supported natively on Linux. For those using Mac or Windows, running Linux in a Docker or virtual machine is necessary to use DeepSparse.

Features

👩‍💻 Pipelines for NLP, CV Classification, CV Detection, CV Segmentation and more!
🔌 DeepSparse Server
📜 DeepSparse Benchmark
☁️ Cloud Deployments and Demos

👩‍💻 Pipelines

Pipelines are a high-level Python interface for running inference with DeepSparse across select tasks in NLP and CV:

NLP	CV
Text Classification `"text_classification"`	Image Classification `"image_classification"`
Token Classification `"token_classification"`	Object Detection `"yolo"`
Sentiment Analysis `"sentiment_analysis"`	Instance Segmentation `"yolact"`
Question Answering `"question_answering"`	Keypoint Detection `"open_pif_paf"`
MultiLabel Text Classification `"text_classification"`
Document Classification `"text_classification"`
Zero-Shot Text Classification `"zero_shot_text_classification"`

NLP Example | Question Answering

from deepsparse import Pipeline

qa_pipeline = Pipeline.create(
    task="question-answering",
    model_path="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni",
)

inference = qa_pipeline(question="What's my name?", context="My name is Snorlax")

CV Example | Image Classification

from deepsparse import Pipeline

cv_pipeline = Pipeline.create(
  task='image_classification', 
  model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none',
)

input_image = "my_image.png"
inference = cv_pipeline(images=input_image)

🔌 DeepSparse Server

DeepSparse Server is a tool that enables you to serve your models and pipelines directly from your terminal.

The server is built on top of two powerful libraries: the FastAPI web framework and the Uvicorn web server. This combination ensures that DeepSparse Server delivers excellent performance and reliability. Install with this command:

pip install deepsparse[server]

Single Model

Once installed, the following example CLI command is available for running inference with a single BERT model:

deepsparse.server \
    task question_answering \
    --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"

To look up arguments run: deepsparse.server --help.

Multiple Models

To deploy multiple models in your setup, a config.yaml file should be created. In the example provided, two BERT models are configured for the question-answering task:

num_workers: 1
endpoints:
    - task: question_answering
      route: /predict/question_answering/base
      model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none
      batch_size: 1
    - task: question_answering
      route: /predict/question_answering/pruned_quant
      model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni
      batch_size: 1

After the config.yaml file has been created, the server can be started by passing the file path as an argument:

deepsparse.server config config.yaml

Read the DeepSparse Server README for further details.

📜 DeepSparse Benchmark

DeepSparse Benchmark, a command-line (CLI) tool, is used to evaluate the DeepSparse Engine's performance with ONNX models. This tool processes arguments, downloads and compiles the network into the engine, creates input tensors, and runs the model based on the selected scenario.

Run deepsparse.benchmark -h to look up arguments:

deepsparse.benchmark [-h] [-b BATCH_SIZE] [-i INPUT_SHAPES] [-ncores NUM_CORES] [-s {async,sync,elastic}] [-t TIME]
                     [-w WARMUP_TIME] [-nstreams NUM_STREAMS] [-pin {none,core,numa}] [-e ENGINE] [-q] [-x EXPORT_PATH]
                     model_path

Refer to the Benchmark README for examples of specific inference scenarios.

🦉 Custom ONNX Model Support

DeepSparse is capable of accepting ONNX models from two sources:

SparseZoo ONNX: This is an open-source repository of sparse models available for download. SparseZoo offers inference-optimized models, which are trained using repeatable sparsification recipes and state-of-the-art techniques from SparseML.

Custom ONNX: Users can provide their own ONNX models, whether dense or sparse. By plugging in a custom model, users can compare its performance with other solutions.

> wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx
Saving to: ‘mobilenetv2-7.onnx’

Custom ONNX Benchmark example:

from deepsparse import compile_model
from deepsparse.utils import generate_random_inputs
onnx_filepath = "mobilenetv2-7.onnx"
batch_size = 16

# Generate random sample input
inputs = generate_random_inputs(onnx_filepath, batch_size)

# Compile and run
engine = compile_model(onnx_filepath, batch_size)
outputs = engine.run(inputs)

The GitHub repository repository contains package APIs and examples that help users swiftly begin benchmarking and performing inference on sparse models.

Scheduling Single-Stream, Multi-Stream, and Elastic Inference

DeepSparse offers different inference scenarios based on your use case. Read more details here: Inference Types.

⚡ Single-stream scheduling: the latency/synchronous scenario, requests execute serially. [default]

It's highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets.

⚡ Multi-stream scheduling: the throughput/asynchronous scenario, requests execute in parallel.

The most common use cases for the multi-stream scheduler are where parallelism is low with respect to core count, and where requests need to be made asynchronously without time to batch them.

Resources

Libraries

Versions

DeepSparse | stable
DeepSparse-Nightly | nightly (dev)
GitHub | releases

Info

Community

Be Part of the Future... And the Future is Sparse!

Contribute with code, examples, integrations, and documentation as well as bug reports and feature requests! Learn how here.

For user help or questions about DeepSparse, sign up or log in to our Deep Sparse Community Slack. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our GitHub Issue Queue. You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by subscribing to the Neural Magic community.

For more general questions about Neural Magic, complete this form.

License

DeepSparse Community is licensed under the Neural Magic DeepSparse Community License. Some source code, example files, and scripts included in the deepsparse GitHub repository or directory are licensed under the Apache License Version 2.0 as noted.

DeepSparse Enterprise requires a Trial License or can be fully licensed for production, commercial applications.

Cite

Find this project useful in your research or other communications? Please consider citing:

@InProceedings{
    pmlr-v119-kurtz20a, 
    title = {Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks}, 
    author = {Kurtz, Mark and Kopinsky, Justin and Gelashvili, Rati and Matveev, Alexander and Carr, John and Goin, Michael and Leiserson, William and Moore, Sage and Nell, Bill and Shavit, Nir and Alistarh, Dan}, 
    booktitle = {Proceedings of the 37th International Conference on Machine Learning}, 
    pages = {5533--5543}, 
    year = {2020}, 
    editor = {Hal Daumé III and Aarti Singh}, 
    volume = {119}, 
    series = {Proceedings of Machine Learning Research}, 
    address = {Virtual}, 
    month = {13--18 Jul}, 
    publisher = {PMLR}, 
    pdf = {http://proceedings.mlr.press/v119/kurtz20a/kurtz20a.pdf},
    url = {http://proceedings.mlr.press/v119/kurtz20a.html}
}

@article{DBLP:journals/corr/abs-2111-13445,
  author    = {Eugenia Iofinova and
               Alexandra Peste and
               Mark Kurtz and
               Dan Alistarh},
  title     = {How Well Do Sparse Imagenet Models Transfer?},
  journal   = {CoRR},
  volume    = {abs/2111.13445},
  year      = {2021},
  url       = {https://arxiv.org/abs/2111.13445},
  eprinttype = {arXiv},
  eprint    = {2111.13445},
  timestamp = {Wed, 01 Dec 2021 15:16:43 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2111-13445.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.9.0

Jun 2, 2025

1.8.0

Jul 19, 2024

1.7.1

Mar 19, 2024

1.7.0

Mar 15, 2024

1.6.1

Dec 20, 2023

1.6.0

Dec 4, 2023

This version

1.5.3

Aug 17, 2023

1.5.2

Jul 6, 2023

1.5.1

Jun 21, 2023

1.5.0

Jun 2, 2023

1.4.2

Mar 25, 2023

1.4.1

Mar 18, 2023

1.4.0

Feb 16, 2023

1.3.2

Jan 17, 2023

1.3.1

Jan 4, 2023

1.3.0

Dec 17, 2022

1.2.0

Oct 27, 2022

1.1.0

Aug 25, 2022

1.0.2

Jul 12, 2022

1.0.1

Jul 7, 2022

1.0.0

Jun 28, 2022

0.12.2

Jun 2, 2022

0.12.1

May 5, 2022

0.12.0

Apr 21, 2022

0.11.2

Mar 23, 2022

0.11.1

Mar 21, 2022

0.11.0

Mar 11, 2022

0.10.0

Feb 3, 2022

0.9.1

Dec 14, 2021

0.9.0

Dec 1, 2021

0.8.0

Oct 26, 2021

0.7.0

Sep 11, 2021

0.6.1

Aug 11, 2021

0.6.0

Jul 30, 2021

0.5.1

Jun 30, 2021

0.5.0

Jun 28, 2021

0.4.0

Jun 4, 2021

0.3.1

May 13, 2021

0.3.0

Apr 30, 2021

0.2.0

Mar 31, 2021

0.1.1

Mar 1, 2021

0.1.0

Feb 4, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepsparse-1.5.3.tar.gz (41.5 MB view details)

Uploaded Aug 17, 2023 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepsparse-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.9 MB view details)

Uploaded Aug 17, 2023 CPython 3.10manylinux: glibc 2.17+ x86-64

deepsparse-1.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.9 MB view details)

Uploaded Aug 17, 2023 CPython 3.9manylinux: glibc 2.17+ x86-64

deepsparse-1.5.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.9 MB view details)

Uploaded Aug 17, 2023 CPython 3.8manylinux: glibc 2.17+ x86-64

deepsparse-1.5.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.9 MB view details)

Uploaded Aug 23, 2023 CPython 3.7mmanylinux: glibc 2.17+ x86-64

File details

Details for the file deepsparse-1.5.3.tar.gz.

File metadata

Download URL: deepsparse-1.5.3.tar.gz
Upload date: Aug 17, 2023
Size: 41.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for deepsparse-1.5.3.tar.gz
Algorithm	Hash digest
SHA256	`248af9e6952ef2d1df95f5b3042363ce9a94e62253120ed303f903fd1e5d095c`
MD5	`5c42969441b8023e55768c66c119b641`
BLAKE2b-256	`18bdd24a290df8454d298c2daac52313ce4dc3030f6fd8b808f884bb5b44698e`

See more details on using hashes here.

File details

Details for the file deepsparse-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: deepsparse-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Aug 17, 2023
Size: 41.9 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for deepsparse-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`273253af406d26ec78c576ea41ac170911c66edcf1c3b22e9d14ada849d39423`
MD5	`bb9602e8eb46fcaf384a23bc9354dba3`
BLAKE2b-256	`987419b9b03b05a735563cf00285f6dcb76cb1838b8f5754869a40cf70fbdaf2`

See more details on using hashes here.

File details

Details for the file deepsparse-1.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: deepsparse-1.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Aug 17, 2023
Size: 41.9 MB
Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for deepsparse-1.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`c9174e75fc5037592f93f93d03c89a2b9b9c321167cfc91989f64a03ee32ff32`
MD5	`b2b8d0dd0de543318d4b0aa56b3c974e`
BLAKE2b-256	`02ba3672d424a47486556c724fbd02509390cc30da0d9d8590289903ded3d6cf`

See more details on using hashes here.

File details

Details for the file deepsparse-1.5.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: deepsparse-1.5.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Aug 17, 2023
Size: 41.9 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for deepsparse-1.5.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`7b7693904628e049985b68b90c85445371278c4225626b807216cf67f04d56e6`
MD5	`86c0954796a1d0303e2f2f7a501428ff`
BLAKE2b-256	`4b27fe23b0a1c76a2fbec1b7c3de938a17ee3f2ffb7ffd2126f96b6599c60252`

See more details on using hashes here.

File details

Details for the file deepsparse-1.5.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: deepsparse-1.5.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Aug 23, 2023
Size: 41.9 MB
Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.24.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.10

File hashes

Hashes for deepsparse-1.5.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`33ba2049cc3720e2a5132c02d15affd0e468a6d80fe1a7a20d7ef87e721818e4`
MD5	`b14db5b831f017f0abb364129fefdc8d`
BLAKE2b-256	`464e3a3e36c990b03ae2ddc5140ba35da835505efcf454347f43bf79e5771a7c`

See more details on using hashes here.

deepsparse 1.5.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DeepSparse

An inference runtime offering GPU-class performance on CPUs and APIs to integrate ML into your application

Installation

🧰 Hardware Support and System Requirements

Features

👩‍💻 Pipelines

🔌 DeepSparse Server

Single Model

Multiple Models

📜 DeepSparse Benchmark

🦉 Custom ONNX Model Support

Scheduling Single-Stream, Multi-Stream, and Elastic Inference

Resources

Libraries

Versions

Info

Community

Be Part of the Future... And the Future is Sparse!

License

Cite

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes