Skip to main content

HuggingFace Inference integration for Vision Agents

Project description

HuggingFace Plugin for Vision Agents

HuggingFace Inference integration for Vision Agents. Supports both text-only LLM and vision language models (VLM) through HuggingFace's Inference Providers API.

Installation

uv add vision-agents[huggingface]

Configuration

Set your HuggingFace API token:

export HF_TOKEN=your_huggingface_token

Usage

Text-only LLM

from vision_agents.plugins import huggingface

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="together",  # optional: use "auto" or omit to let HuggingFace auto-select based on your settings
)

response = await llm.simple_response("Hello, how are you?")
print(response.text)

Vision Language Model (VLM)

from vision_agents.plugins import huggingface

vlm = huggingface.VLM(
    model="Qwen/Qwen2-VL-7B-Instruct",
    fps=1,
    frame_buffer_seconds=10,
)

# VLM automatically buffers video frames when used with an Agent
response = await vlm.simple_response("What do you see?")
print(response.text)

With Function Calling

from vision_agents.plugins import huggingface

llm = huggingface.LLM(model="meta-llama/Meta-Llama-3-8B-Instruct")

@llm.register_function()
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny."

response = await llm.simple_response("What's the weather in Paris?")

Supported Providers

HuggingFace's Inference Providers API supports multiple backends:

  • Together AI
  • Groq
  • Cerebras
  • Replicate
  • Fireworks
  • And more

Specify a provider explicitly or let HuggingFace auto-select:

llm = huggingface.LLM(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    provider="groq",
)

API Reference

huggingface.LLM

Text-only language model integration.

Parameters:

  • model (str): HuggingFace model ID
  • api_key (str, optional): HuggingFace API token (defaults to HF_TOKEN env var)
  • provider (str, optional): Inference provider name

huggingface.VLM

Vision language model integration with video frame buffering.

Parameters:

  • model (str): HuggingFace model ID
  • api_key (str, optional): HuggingFace API token (defaults to HF_TOKEN env var)
  • provider (str, optional): Inference provider name
  • fps (int): Frames per second to buffer (default: 1)
  • frame_buffer_seconds (int): Seconds of video to buffer (default: 10)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vision_agents_plugins_huggingface-0.3.8.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file vision_agents_plugins_huggingface-0.3.8.tar.gz.

File metadata

File hashes

Hashes for vision_agents_plugins_huggingface-0.3.8.tar.gz
Algorithm Hash digest
SHA256 b418b3f406b297b45a8b7b14af130adc9a2fb7dfb0a9de677f08c678701a18b7
MD5 fd1606d9ef693338b3a7f69d38e37a10
BLAKE2b-256 a4f458491332ded6dda1ad02f8bc9ddfd8c97f9f746eb4f31e9fa1ef1c31ad80

See more details on using hashes here.

File details

Details for the file vision_agents_plugins_huggingface-0.3.8-py3-none-any.whl.

File metadata

File hashes

Hashes for vision_agents_plugins_huggingface-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b32f9a99f70756d3121a943b1908874725c48448b813f82680eb2ec175bb8f32
MD5 6c03cc442b93160b431bd781189e16fc
BLAKE2b-256 54728b45714269ac9e61565dc66eb8cfd2bf07ac60069849d25b00c8d39a2a44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page