Skip to main content

HuggingFace Inference integration for Vision Agents

Project description

HuggingFace Plugin for Vision Agents

HuggingFace integration for Vision Agents. Supports cloud-based inference via HuggingFace's Inference Providers API and local on-device inference via Transformers.

Installation

# Cloud inference (HuggingFace Inference API)
uv add "vision-agents[huggingface]"

# or directly
uv add vision-agents-plugins-huggingface

# Local inference (Transformers - LLM, VLM, object detection)
uv add "vision-agents-plugins-huggingface[transformers]"

# Local inference with quantization (4-bit / 8-bit)
uv add "vision-agents-plugins-huggingface[transformers-quantized]"

Cloud Inference (API-based)

Configuration

export HF_TOKEN=your_huggingface_token

Text-only LLM

from vision_agents.plugins import huggingface

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="together",  # or "groq", "cerebras", etc.
)

response = await llm.simple_response("Hello, how are you?")
print(response.text)

Vision Language Model (VLM)

from vision_agents.plugins import huggingface

vlm = huggingface.VLM(
    model="Qwen/Qwen2-VL-7B-Instruct",
    fps=1,
    frame_buffer_seconds=10,
)

response = await vlm.simple_response("What do you see?")
print(response.text)

Local Inference (Transformers)

Runs models directly on your hardware (GPU/CPU/MPS). Requires the [transformers] extra.

Local LLM

from vision_agents.plugins import huggingface

llm = huggingface.TransformersLLM(
    model="meta-llama/Llama-3.2-3B-Instruct",
)


@llm.register_function()
async def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny."


response = await llm.simple_response("What's the weather in Paris?")

Supported Providers

With 4-bit quantization (~4x memory reduction)

llm = huggingface.TransformersLLM( model="meta-llama/Llama-3.2-3B-Instruct", quantization="4bit", )


**Parameters:**

- `model` (str): HuggingFace model ID
- `device`: `"auto"`, `"cuda"`, `"mps"`, or `"cpu"`
- `quantization`: `"none"`, `"4bit"`, or `"8bit"`
- `torch_dtype`: `"auto"`, `"float16"`, `"bfloat16"`, or `"float32"`
- `max_new_tokens` (int): Max tokens per response (default: 512)

### Local VLM

```python
from vision_agents.plugins import huggingface

vlm = huggingface.TransformersVLM(
    model="Qwen/Qwen2-VL-2B-Instruct",
)

Parameters:

  • model (str): HuggingFace model ID
  • device: "auto", "cuda", "mps", or "cpu"
  • quantization: "none", "4bit", or "8bit"
  • fps (int): Frames per second to capture (default: 1)
  • frame_buffer_seconds (int): Seconds of video to buffer (default: 10)
  • max_frames (int): Max frames per inference (default: 4)

Local Object Detection

Runs detection models like RT-DETRv2 on video frames and emits DetectionCompletedEvent with bounding boxes.

from vision_agents.core import Agent
from vision_agents.plugins import huggingface

processor = huggingface.TransformersDetectionProcessor(
    model="PekingU/rtdetr_v2_r101vd",
    conf_threshold=0.5,
    fps=5,
)

agent = Agent(processors=[processor], ...)

@agent.events.subscribe
async def on_detection(event: huggingface.DetectionCompletedEvent):
    for obj in event.objects:
        print(f"{obj['label']} ({obj['confidence']:.0%})")

Parameters:

  • model (str): HuggingFace model ID (default: "PekingU/rtdetr_v2_r101vd")
  • conf_threshold (float): Confidence threshold 0-1 (default: 0.5)
  • fps (int): Frame processing rate (default: 10)
  • classes (list[str], optional): Filter to specific class names
  • device: "auto", "cuda", "mps", or "cpu"
  • annotate (bool): Draw bounding boxes on output video (default: True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_huggingface-0.5.6.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file vision_agents_plugins_huggingface-0.5.6.tar.gz.

File metadata

  • Download URL: vision_agents_plugins_huggingface-0.5.6.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_huggingface-0.5.6.tar.gz
Algorithm Hash digest
SHA256 211227aaa7753969b34e66083ccf15007551c87b62de99f155f379d0aec89b1e
MD5 5945091f61ff28e222fceb5098521bd9
BLAKE2b-256 aacaaaf3895802f9142930b5bd78a6da09cd7f8662b330228a9f33b057ac1181

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_huggingface-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: vision_agents_plugins_huggingface-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 34.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for vision_agents_plugins_huggingface-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 aefba340c03527c7506b23b52ba42850a8e94a731218949e8134bd0382a89fe9
MD5 3cdeffae199565cb350a95ea3669e367
BLAKE2b-256 b3fc424258e072ed947cb29179f748a8a4c24843da58b73272ace4ef79e7e1e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page