Skip to main content

Experimental vresion of inference package which is supposed to evolve into inference 1.0

Project description

Experimental version of inference

🚀 Introducing inference-exp - the evolution of inference

At Roboflow, we’re taking a bold step toward a new generation of inference — designed to be faster, more reliable, and more user-friendly. With this vision in mind, we’re building a new library called inference-exp.

This is an early-stage project, and we’re sharing initial versions to gather valuable community feedback. Your input will help us shape and steer this initiative in the right direction.

We’re excited to have you join us on this journey — let’s build something great together! 🤝

[!CAUTION] The inference-exp package is an experimental preview of upcoming inference capabilities.

  • Features may change, break, or be removed without notice.
  • We do not guarantee backward compatibility between releases.

We strongly advise against using inference-exp in production systems - for such purposes please continue to use the stable inference package.

⚡ Installation

[!TIP] We recommend using uv to install inference-exp. To install the tool, follow official guide or use the snippet below:

curl -LsSf https://astral.sh/uv/install.sh | sh

Use the following command to install inference-exp on CPU machine 💻 (below you can find more advanced options):

uv pip install inference-exp
# or - if you use pip
pip install inference-exp
👉 GPU installation

As you may learn from 📜 Principles and Assumptions, inference-exp is designed to compose the build out of different extras defined for the package. Some extras bring new models, while others - ability to run models created for specific backend. To get the most out of the installation on GPU machine, we recommend including TRT and ONNX extensions, as well as select torch-cu* extras to install torch compliant with version of CUDA installed on the machine. ONNX backend is particularly important when running models trained on Roboflow platform.

uv pip install "inference-exp[torch-cu128,onnx-cu12,trt10]" "tensorrt==10.12.0.36"
# or - if you use pip
pip install "inference-exp[torch-cu128,onnx-cu12,trt10]" "tensorrt==10.12.0.36"

To avoid clashes with external packages, pyproject.toml defines quite loose restrictions for the dependent packages. Some packages, like tensorrt are good to be kept under more strict control (as some TRT engines will only work when there is an exact match of environment that runs the model with the one that compiled it) - that's why we recommend fixing tensorrt version to the one we currently use to compile TRT artefacts.

Additionally, library defines set of torch-* extras which, thanks to uv deliver extra packages indexes adjusted for specific CUDA version: torch-cu118, torch-cu124, torch-cu126, torch-cu128, torch-jp6-cu126.

👉 CPU installation - enabling models trained with Roboflow

For CPU installations, we recommend installing ONNX backed, as the majority of models trained on Roboflow platform are exported to ONNX and not available:

# to install with ONNX backend
uv pip install "inference-exp[onnx-cpu]"
# or - to install only base dependencies
uv pip install inference-exp
👉 Reproducibility of installation

Using uv pip install ... or pip install, it is possible to get non-reproducible builds (as pyproject.toml defines quite loose restrictions for the dependent packages). If you care about strict control of dependencies - follow the installation method based on uv.lock which is demonstrated in official docker builds of the library.

📖 Basic Usage

from inference_exp import AutoModel
import cv2
import supervision as sv

# loads model from Roboflow API (loading from local dir also available)
model = AutoModel.from_pretrained("rfdetr-base")  
image = cv2.imread("<path-to-your-image>")
predictions = model(image)

# integration with supervision
annotator = sv.BoxAnnotator()
annotated = annotator.annotate(image.copy(), predictions[0].to_supervision())

[!TIP] Model failed to load, and you see error prompting you to install additional dependencies?

Take a look at 📜 Principles and Assumptions to understand why this happens and navigate to extras section to find out which extra dependency you need to install. The common issue is lack of ONNX backend required to run models trained on Roboflow platform.

📜 Principles and Assumptions

  • We define a model as weights trained on a dataset, which can be exported or compiled into multiple equivalent model packages, each optimized for specific environments (e.g., speed, flexibility).

  • The new inference library is multi-backend, able to run model packages in different formats depending on the installed dependencies - with the scope of supported models dependent on the choice of package extras made during installation

  • We aim to keep the extra dependencies minimal while covering as broad a range of models as possible.

  • By default, we include PyTorch and Hugging Face Transformers; optional extras are available for TensorRT (TRT) and ONNX backends, with a runtime preference order: TRT → Torch → ONNX. We wish new models are mostly based on Torch.

  • Backend selection happens dynamically at runtime, based on model metadata and environment checks, but can be fully overridden by the user when needed.

🔌 Extra Dependencies

Extras dependencies are optional features of the package that can be installed with:

uv pip install "inference-exp[extras-name-1,extras-name-1]"
# or - if you use pip
pip install "inference-exp[extras-name-1,extras-name-2]"

In case of inference-exp, extras bring either additional backends (dependencies to run AI models of different type, like TensorRT engines) or additional models.

Backends

Extras names Backend Description
torch-cu118, torch-cu124, torch-cu126, torch-cu128, torch-jp6-cu126 PyTorch Provide specific variants of torch to match installed CUDA version, only works with uv which is capable of reading extra indexes from pyproject.toml, when using with pip, use --extra-index-url. By default, CPU version of torch is installed with the library. Torch backend is a default one for the library. Extras named torch-cu* are relevant for GPU servers with certain CUDA version, whereas extras like torch-jp6-cu126 are to be installed on Jetson with specific Jetpack and CUDA versions.
onnx-cpu, onnx-cu118, onnx-cu12, onnx-jp6-cu126 ONNX Provide specific variants of onnxruntime. only works with uv which is capable of reading extra indexes from pyproject.toml, when using with pip, use --extra-index-url. This extras is not installed by default and is not required, but enables wide variety of models trained on Roboflow Platform. Extras named onnx-cu* are relevant for GPU servers with certain CUDA version, whereas extras like onnx-jp6-cu126 are to be installed on Jetson with specific Jetpack and CUDA versions.
trt10 TRT Provide specific variants of tensorrt, only works on GPU servers. Jetson installations should fall back to pre-compiled package shipped with Jetpack.

Additional models / capabilities

Extras Description
mediapipe Enables MediaPipe models, including Face Detector
grounding-dino Enables Grounding Dino model
flash-attn EXPERIMENTAL: Installs flash-attn for faster LLMs/VLMs - usually requires extensive compilation
test Test dependencies

Special Installation: SAM2 Real-Time

sam2 real time requires a Git-based dependency that cannot be distributed via PyPI. To use SAM2 real-time capabilities, you need to manually install it after installing inference-exp:

# First, install inference-exp with your desired extras (e.g., torch-cu124)
pip install "inference-exp[torch-cu124]"

# Then, install SAM2 real-time from GitHub
pip install git+https://github.com/Gy920/segment-anything-2-real-time.git

For development environments:

# First sync the project
uv sync --dev

# Then manually install SAM 2 from the GitHub repository
# Note: The package installs as "SAM 2" (with a space)
uv pip install git+https://github.com/Gy920/segment-anything-2-real-time.git

[!NOTE] Due to PyPI restrictions on Git dependencies, the SAM2 real-time package must be installed separately from the GitHub repository. The package will be installed with the name "SAM 2" (with a space).

[!IMPORTANT]
Not all extras are possible to be installed together in a single environment. We try to make the extras as composable as possible, but this will not always be possible, and sometimes you need to choose which extras are to be installed.

🧠 Models

[!IMPORTANT] If you see a bug in model implementation or loading mechanism - create new issue tagging it with inference-exp-bug.

Additionally, We are working hard to extend pool of supported models - suggestions on new models to be added appreciated 🤝

Below there is a table showcasing models that are supported, with the hints regarding extra dependencies that are required.

Architecture Task Type Supported backends
RFDetr object-detection trt, torch
YOLO v8 object-detection onnx, trt
YOLO v8 instance-segmentation onnx, trt
YOLO v9 object-detection onnx, trt
YOLO v10 object-detection onnx, trt
YOLO v11 object-detection onnx, trt
YOLO v11 instance-segmentation onnx, trt
Perception Encoder embedding torch
CLIP embedding torch, onnx

Registered pre-trained weights

Below you can find a list of model IDs registered in Roboflow weights provider (along with notes about access rights).

  • public-open - available without Roboflow API key, but under licenses for specific model

  • public-api-key-gated - available for everyone with Roboflow API key

Models:

👉 RFDetr

Access level: public-open

License: Apache 2.0

The following model IDs are registered:

  • rfdetr-base (trained on COCO dataset)

  • rfdetr-base (trained on COCO dataset)

👉 YOLO v8

Access level: public-open

License: AGPL

The following model IDs are registered for object detection task:

  • yolov8n-640 (trained on COCO dataset)

  • yolov8n-1280 (trained on COCO dataset)

  • yolov8s-640 (trained on COCO dataset)

  • yolov8s-1280 (trained on COCO dataset)

  • yolov8m-640 (trained on COCO dataset)

  • yolov8m-1280 (trained on COCO dataset)

  • yolov8l-640 (trained on COCO dataset)

  • yolov8l-1280 (trained on COCO dataset)

  • yolov8x-640 (trained on COCO dataset)

  • yolov8x-1280 (trained on COCO dataset)

The following model IDs are registered for instance segmentation task:

  • yolov8n-seg-640 (trained on COCO dataset)

  • yolov8n-seg-1280 (trained on COCO dataset)

  • yolov8s-seg-640 (trained on COCO dataset)

  • yolov8s-seg-1280 (trained on COCO dataset)

  • yolov8m-seg-640 (trained on COCO dataset)

  • yolov8m-seg-1280 (trained on COCO dataset)

  • yolov8l-seg-640 (trained on COCO dataset)

  • yolov8l-seg-1280 (trained on COCO dataset)

  • yolov8x-seg-640 (trained on COCO dataset)

  • yolov8x-seg-1280 (trained on COCO dataset)

👉 YOLO v10

Access level: public-open

License: AGPL

The following model IDs are registered for object detection task:

  • yolov10n-640 (trained on COCO dataset)

  • yolov10s-640 (trained on COCO dataset)

  • yolov10m-640 (trained on COCO dataset)

  • yolov10b-640 (trained on COCO dataset)

  • yolov10l-640 (trained on COCO dataset)

  • yolov10x-640 (trained on COCO dataset)

👉 Perception Encoder

Access level: public-open

License: FAIR Noncommercial Research License

The following model IDs are registered:

  • perception-encoder/PE-Core-B16-224

  • perception-encoder/PE-Core-G14-448

  • perception-encoder/PE-Core-L14-336

👉 CLIP

Access level: public-open

License: MIT

The following model IDs are registered:

  • clip/RN50

  • clip/RN101

  • clip/RN50x16

  • clip/RN50x4

  • clip/RN50x64

  • clip/ViT-B-16

  • clip/ViT-B-32

  • clip/ViT-L-14-336px

  • clip/ViT-L-14

📜 Citations

@article{bolya2025PerceptionEncoder,
  title={Perception Encoder: The best visual embeddings are not at the output of the network},
  author={Daniel Bolya and Po-Yao Huang and Peize Sun and Jang Hyun Cho and Andrea Madotto and Chen Wei and Tengyu Ma and Jiale Zhi and Jathushan Rajasegaran and Hanoona Rasheed and Junke Wang and Marco Monteiro and Hu Xu and Shiyu Dong and Nikhila Ravi and Daniel Li and Piotr Doll{\'a}r and Christoph Feichtenhofer},
  journal={arXiv:2504.13181},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inference_exp-0.15.6.tar.gz (198.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inference_exp-0.15.6-py3-none-any.whl (304.3 kB view details)

Uploaded Python 3

File details

Details for the file inference_exp-0.15.6.tar.gz.

File metadata

  • Download URL: inference_exp-0.15.6.tar.gz
  • Upload date:
  • Size: 198.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for inference_exp-0.15.6.tar.gz
Algorithm Hash digest
SHA256 76cf9ef986a1af7f0fe17e8904d271d20eb96cd179351bf70c2d6ff508049f07
MD5 b667ccdf1c2f827ffe57882a573d8622
BLAKE2b-256 22af05945b38e0191bff20c0f7501379bfa29539f1bd4c7e8ebca4d417b3887b

See more details on using hashes here.

File details

Details for the file inference_exp-0.15.6-py3-none-any.whl.

File metadata

File hashes

Hashes for inference_exp-0.15.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a9e232557d7dd8106a8117c736fb6f60b474c5cf3e7452656b0d2cbab154a71b
MD5 d79ace92c0a5bb0e01d0ed8907f073a4
BLAKE2b-256 ead300b584dd7e114c3f17c0c01433fb1c6ac4d44a0e95a8644243e8cef3ea3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page