Skip to main content

HuggingFace Inference integration for Vision Agents

Project description

HuggingFace Plugin for Vision Agents

HuggingFace Inference integration for Vision Agents. Supports both text-only LLM and vision language models (VLM) through HuggingFace's Inference Providers API.

Installation

uv add vision-agents[huggingface]

Configuration

Set your HuggingFace API token:

export HF_TOKEN=your_huggingface_token

Usage

Text-only LLM

from vision_agents.plugins import huggingface

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="together",  # optional: use "auto" or omit to let HuggingFace auto-select based on your settings
)

response = await llm.simple_response("Hello, how are you?")
print(response.text)

Vision Language Model (VLM)

from vision_agents.plugins import huggingface

vlm = huggingface.VLM(
    model="Qwen/Qwen2-VL-7B-Instruct",
    fps=1,
    frame_buffer_seconds=10,
)

# VLM automatically buffers video frames when used with an Agent
response = await vlm.simple_response("What do you see?")
print(response.text)

With Function Calling

from vision_agents.plugins import huggingface

llm = huggingface.LLM(model="meta-llama/Meta-Llama-3-8B-Instruct")

@llm.register_function()
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny."

response = await llm.simple_response("What's the weather in Paris?")

Supported Providers

HuggingFace's Inference Providers API supports multiple backends:

  • Together AI
  • Groq
  • Cerebras
  • Replicate
  • Fireworks
  • And more

Specify a provider explicitly or let HuggingFace auto-select:

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="groq",
)

API Reference

huggingface.LLM

Text-only language model integration.

Parameters:

  • model (str): HuggingFace model ID
  • api_key (str, optional): HuggingFace API token (defaults to HF_TOKEN env var)
  • provider (str, optional): Inference provider name

huggingface.VLM

Vision language model integration with video frame buffering.

Parameters:

  • model (str): HuggingFace model ID
  • api_key (str, optional): HuggingFace API token (defaults to HF_TOKEN env var)
  • provider (str, optional): Inference provider name
  • fps (int): Frames per second to buffer (default: 1)
  • frame_buffer_seconds (int): Seconds of video to buffer (default: 10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_huggingface-0.4.1.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file vision_agents_plugins_huggingface-0.4.1.tar.gz.

File metadata

File hashes

Hashes for vision_agents_plugins_huggingface-0.4.1.tar.gz
Algorithm Hash digest
SHA256 4a2585660f2709aa43b3aa167cf142584e53f6b87087d1c34e0f7191273ae866
MD5 928027f2d7b60db27da9fcfcc8a30a44
BLAKE2b-256 7970dc8f97c0b0d229fa9a734d98d489095dd547295b918bbf2e0564b63d9221

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_huggingface-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_agents_plugins_huggingface-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 29fb076f45a22af75d38291aa13981ee79ce8d3be7dd641848e7353a477a3e86
MD5 563c5e213151adea010a5d3fdf6d995a
BLAKE2b-256 41a64393b37669ff6ae3bf6a6b1d3fd0da3cade321a56e16dfe20c5c38fdcba0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page