HuggingFace Inference integration for Vision Agents
Project description
HuggingFace Plugin for Vision Agents
HuggingFace integration for Vision Agents. Supports cloud-based inference via HuggingFace's Inference Providers API and local on-device inference via Transformers.
Installation
# Cloud inference (HuggingFace Inference API)
uv add "vision-agents[huggingface]"
# or directly
uv add vision-agents-plugins-huggingface
# Local inference (Transformers - LLM, VLM, object detection)
uv add "vision-agents-plugins-huggingface[transformers]"
# Local inference with quantization (4-bit / 8-bit)
uv add "vision-agents-plugins-huggingface[transformers-quantized]"
Cloud Inference (API-based)
Configuration
export HF_TOKEN=your_huggingface_token
Text-only LLM
from vision_agents.plugins import huggingface
llm = huggingface.LLM(
model="meta-llama/Meta-Llama-3-8B-Instruct",
provider="together", # or "groq", "cerebras", etc.
)
response = await llm.simple_response("Hello, how are you?")
print(response.text)
Vision Language Model (VLM)
from vision_agents.plugins import huggingface
vlm = huggingface.VLM(
model="Qwen/Qwen2-VL-7B-Instruct",
fps=1,
frame_buffer_seconds=10,
)
response = await vlm.simple_response("What do you see?")
print(response.text)
Local Inference (Transformers)
Runs models directly on your hardware (GPU/CPU/MPS). Requires the [transformers] extra.
Local LLM
from vision_agents.plugins import huggingface
llm = huggingface.TransformersLLM(
model="meta-llama/Llama-3.2-3B-Instruct",
)
@llm.register_function()
async def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"The weather in {city} is sunny."
response = await llm.simple_response("What's the weather in Paris?")
Supported Providers
With 4-bit quantization (~4x memory reduction)
llm = huggingface.TransformersLLM( model="meta-llama/Llama-3.2-3B-Instruct", quantization="4bit", )
**Parameters:**
- `model` (str): HuggingFace model ID
- `device`: `"auto"`, `"cuda"`, `"mps"`, or `"cpu"`
- `quantization`: `"none"`, `"4bit"`, or `"8bit"`
- `torch_dtype`: `"auto"`, `"float16"`, `"bfloat16"`, or `"float32"`
- `max_new_tokens` (int): Max tokens per response (default: 512)
### Local VLM
```python
from vision_agents.plugins import huggingface
vlm = huggingface.TransformersVLM(
model="Qwen/Qwen2-VL-2B-Instruct",
)
Parameters:
model(str): HuggingFace model IDdevice:"auto","cuda","mps", or"cpu"quantization:"none","4bit", or"8bit"fps(int): Frames per second to capture (default: 1)frame_buffer_seconds(int): Seconds of video to buffer (default: 10)max_frames(int): Max frames per inference (default: 4)
Local Object Detection
Runs detection models like RT-DETRv2 on video frames and emits DetectionCompletedEvent with bounding boxes.
from vision_agents.core import Agent
from vision_agents.plugins import huggingface
processor = huggingface.TransformersDetectionProcessor(
model="PekingU/rtdetr_v2_r101vd",
conf_threshold=0.5,
fps=5,
)
agent = Agent(processors=[processor], ...)
@agent.events.subscribe
async def on_detection(event: huggingface.DetectionCompletedEvent):
for obj in event.objects:
print(f"{obj['label']} ({obj['confidence']:.0%})")
Parameters:
model(str): HuggingFace model ID (default:"PekingU/rtdetr_v2_r101vd")conf_threshold(float): Confidence threshold 0-1 (default: 0.5)fps(int): Frame processing rate (default: 10)classes(list[str], optional): Filter to specific class namesdevice:"auto","cuda","mps", or"cpu"annotate(bool): Draw bounding boxes on output video (default: True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vision_agents_plugins_huggingface-0.5.2.tar.gz.
File metadata
- Download URL: vision_agents_plugins_huggingface-0.5.2.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fdb4b5a9a014d3215d526d66e158e1eeae61426060d2c32c0f83ab54e54388a
|
|
| MD5 |
3a537a0e4d15fde21074a0fb916f22d3
|
|
| BLAKE2b-256 |
02b98045fec621b1425a980a80e3bed4e9b02d9a13e05b458aa77024d7ff61a9
|
File details
Details for the file vision_agents_plugins_huggingface-0.5.2-py3-none-any.whl.
File metadata
- Download URL: vision_agents_plugins_huggingface-0.5.2-py3-none-any.whl
- Upload date:
- Size: 34.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
439774aafeb85b087ce22bc905702115aaa20c23c9b9108704a170dc0c833fc4
|
|
| MD5 |
cce8aaa82fea10aeb756c280d07a3fdc
|
|
| BLAKE2b-256 |
d178204f7e6addf8feb6ed1362bc15b5f39944bbc9014f82d722042bc00660c4
|