HuggingFace Inference integration for Vision Agents

These details have not been verified by PyPI

Project links

Project description

HuggingFace Plugin for Vision Agents

HuggingFace integration for Vision Agents. Supports cloud-based inference via HuggingFace's Inference Providers API and local on-device inference via Transformers.

Installation

# Cloud inference (HuggingFace Inference API)
uv add "vision-agents[huggingface]"

# or directly
uv add vision-agents-plugins-huggingface

# Local inference (Transformers - LLM, VLM, object detection)
uv add "vision-agents-plugins-huggingface[transformers]"

# Local inference with quantization (4-bit / 8-bit)
uv add "vision-agents-plugins-huggingface[transformers-quantized]"

Cloud Inference (API-based)

Configuration

export HF_TOKEN=your_huggingface_token

Text-only LLM

from vision_agents.plugins import huggingface

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="together",  # or "groq", "cerebras", etc.
)

response = await llm.simple_response("Hello, how are you?")
print(response.text)

Vision Language Model (VLM)

from vision_agents.plugins import huggingface

vlm = huggingface.VLM(
    model="Qwen/Qwen2-VL-7B-Instruct",
    fps=1,
    frame_buffer_seconds=10,
)

response = await vlm.simple_response("What do you see?")
print(response.text)

Local Inference (Transformers)

Runs models directly on your hardware (GPU/CPU/MPS). Requires the [transformers] extra.

Local LLM

from vision_agents.plugins import huggingface

llm = huggingface.TransformersLLM(
    model="meta-llama/Llama-3.2-3B-Instruct",
)


@llm.register_function()
async def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny."


response = await llm.simple_response("What's the weather in Paris?")

Supported Providers

With 4-bit quantization (~4x memory reduction)

llm = huggingface.TransformersLLM( model="meta-llama/Llama-3.2-3B-Instruct", quantization="4bit", )


**Parameters:**

- `model` (str): HuggingFace model ID
- `device`: `"auto"`, `"cuda"`, `"mps"`, or `"cpu"`
- `quantization`: `"none"`, `"4bit"`, or `"8bit"`
- `torch_dtype`: `"auto"`, `"float16"`, `"bfloat16"`, or `"float32"`
- `max_new_tokens` (int): Max tokens per response (default: 512)

### Local VLM

```python
from vision_agents.plugins import huggingface

vlm = huggingface.TransformersVLM(
    model="Qwen/Qwen2-VL-2B-Instruct",
)

Parameters:

model (str): HuggingFace model ID
device: "auto", "cuda", "mps", or "cpu"
quantization: "none", "4bit", or "8bit"
fps (int): Frames per second to capture (default: 1)
frame_buffer_seconds (int): Seconds of video to buffer (default: 10)
max_frames (int): Max frames per inference (default: 4)

Local Object Detection

Runs detection models like RT-DETRv2 on video frames and emits DetectionCompletedEvent with bounding boxes.

from vision_agents.core import Agent
from vision_agents.plugins import huggingface

processor = huggingface.TransformersDetectionProcessor(
    model="PekingU/rtdetr_v2_r101vd",
    conf_threshold=0.5,
    fps=5,
)

agent = Agent(processors=[processor], ...)

@agent.events.subscribe
async def on_detection(event: huggingface.DetectionCompletedEvent):
    for obj in event.objects:
        print(f"{obj['label']} ({obj['confidence']:.0%})")

Parameters:

model (str): HuggingFace model ID (default: "PekingU/rtdetr_v2_r101vd")
conf_threshold (float): Confidence threshold 0-1 (default: 0.5)
fps (int): Frame processing rate (default: 10)
classes (list[str], optional): Filter to specific class names
device: "auto", "cuda", "mps", or "cpu"
annotate (bool): Draw bounding boxes on output video (default: True)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.6

May 5, 2026

0.5.5

Apr 27, 2026

0.5.4

Apr 15, 2026

0.5.3

Apr 14, 2026

This version

0.5.2

Apr 13, 2026

0.5.1

Apr 7, 2026

0.5.0

Apr 1, 2026

0.4.7

Mar 27, 2026

0.4.6

Mar 26, 2026

0.4.5

Mar 25, 2026

0.4.4

Mar 23, 2026

0.4.3

Mar 11, 2026

0.4.2

Mar 10, 2026

0.4.1

Mar 4, 2026

0.4.0

Mar 3, 2026

0.3.8

Feb 24, 2026

0.3.7

Feb 23, 2026

0.3.6

Feb 13, 2026

0.3.5

Feb 10, 2026

0.3.4

Feb 6, 2026

0.3.3

Feb 4, 2026

0.3.2

Jan 27, 2026

0.3.1

Jan 21, 2026

0.3.0

Jan 20, 2026

0.2.10

Jan 14, 2026

0.2.9

Jan 9, 2026

0.2.8

Jan 8, 2026

0.2.7

Jan 7, 2026

0.2.6

Dec 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_huggingface-0.5.2.tar.gz (20.8 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vision_agents_plugins_huggingface-0.5.2-py3-none-any.whl (34.9 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file vision_agents_plugins_huggingface-0.5.2.tar.gz.

File metadata

Download URL: vision_agents_plugins_huggingface-0.5.2.tar.gz
Upload date: Apr 13, 2026
Size: 20.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_huggingface-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`7fdb4b5a9a014d3215d526d66e158e1eeae61426060d2c32c0f83ab54e54388a`
MD5	`3a537a0e4d15fde21074a0fb916f22d3`
BLAKE2b-256	`02b98045fec621b1425a980a80e3bed4e9b02d9a13e05b458aa77024d7ff61a9`

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_huggingface-0.5.2-py3-none-any.whl.

File metadata

Download URL: vision_agents_plugins_huggingface-0.5.2-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 34.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_huggingface-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`439774aafeb85b087ce22bc905702115aaa20c23c9b9108704a170dc0c833fc4`
MD5	`cce8aaa82fea10aeb756c280d07a3fdc`
BLAKE2b-256	`d178204f7e6addf8feb6ed1362bc15b5f39944bbc9014f82d722042bc00660c4`

See more details on using hashes here.

vision-agents-plugins-huggingface 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

HuggingFace Plugin for Vision Agents

Installation

Cloud Inference (API-based)

Configuration

Text-only LLM

Vision Language Model (VLM)

Local Inference (Transformers)

Local LLM

Supported Providers

With 4-bit quantization (~4x memory reduction)

Local Object Detection

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes