LLM plugin to access Hugging Face models

These details have not been verified by PyPI

Project links

Project description

llm-huggingface

Access Hugging Face models via the Inference API

Installation

Install this plugin in the same environment as LLM.

llm install llm-huggingface

Configuration

Configure the plugin by setting your Hugging Face API token:

llm keys set huggingface

<paste key here>

You can also set the API key by assigning it to the environment variable HUGGINGFACE_TOKEN.

Usage

The plugin automatically discovers and registers all available text-generation models from Hugging Face. Run a model using the hf/ prefix:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct "Write a haiku about coding"

You can list all available Hugging Face models:

llm models | grep "^hf/"

Set a default model to avoid the -m option:

llm models default hf/mistralai/Mistral-7B-Instruct-v0.3
llm "Explain quantum computing in simple terms"

Features

Streaming Responses

The plugin supports streaming for real-time output:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct "Tell me a story" --stream

JSON Schema Output

Force structured JSON output using schemas:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
  --schema '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}}' \
  "Generate a person's profile"

Or use the simpler DSL format:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
  --schema 'name: str, age: int, hobbies: list[str]' \
  "Generate a person's profile with hobbies"

Note: JSON schema support varies by model. Some models may not fully support structured output or may produce inconsistent results. Models specifically fine-tuned for instruction following and structured output generation tend to perform better with schemas.

Function Calling / Tools

The plugin supports function calling for models that have this capability:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
  --tool calculate 'def calculate(expression: str) -> float: """Evaluate a mathematical expression"""' \
  "What is 15% of 240?"

Generation Parameters

Control generation with various parameters:

Temperature (0.0-2.0): Controls randomness

llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o temperature 0.7 "Write creatively"

Top-p (0.0-1.0): Nucleus sampling threshold

llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o top_p 0.9 "Generate text"

Max tokens: Limit response length

llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_tokens 100 "Explain AI"

Stop sequences: Halt generation at specific strings

llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o stop '[".", "!"]' "Generate until punctuation"

Interactive Chat

Start an interactive chat session:

llm chat -m hf/meta-llama/Llama-3.2-3B-Instruct

Conversation History

Continue previous conversations:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct "What is Python?" -c
llm -c "What are its main uses?"

Advanced Options

Token Limits

The plugin supports both max_tokens (chat-completions style) and max_new_tokens (text-generation style) parameters:

# Using max_tokens (preferred)
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_tokens 500 "Tell a story"

# Using max_new_tokens (for compatibility)
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_new_tokens 500 "Tell a story"

Note: You cannot set both max_tokens and max_new_tokens simultaneously.

Response Metadata

Access detailed response metadata using the Python API:

import llm

model = llm.get_model("hf/meta-llama/Llama-3.2-3B-Instruct")
response = model.prompt("Hello")

# Access metadata
print(response.response_json)
# {'usage': {...}, 'model': '...', 'finish_reason': 'stop', ...}

Python API

Use the plugin programmatically:

import llm

# Get a model
model = llm.get_model("hf/meta-llama/Llama-3.2-3B-Instruct")

# Simple prompt
response = model.prompt("Explain machine learning")
print(response.text())

# With options
response = model.prompt(
    "Write a poem",
    temperature=0.9,
    max_tokens=200
)

# Streaming
for chunk in model.prompt("Tell me a story", stream=True):
    print(chunk, end="", flush=True)

# With system prompt
response = model.prompt(
    "Translate to French: Hello",
    system="You are a helpful translation assistant."
)

Model Discovery

The plugin automatically discovers models that support the text-generation task from Hugging Face. The discovery is cached to improve performance. Models are registered with the hf/ prefix to distinguish them from other LLM providers.

Limitations

Attachments: Image, audio, and file attachments are not currently supported
Model availability: Not all Hugging Face models may be accessible via the Inference API
Rate limits: Subject to Hugging Face API rate limits based on your account type

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-huggingface
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1

Aug 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_huggingface_plugin-0.1.tar.gz (16.1 kB view details)

Uploaded Aug 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_huggingface_plugin-0.1-py3-none-any.whl (14.7 kB view details)

Uploaded Aug 14, 2025 Python 3

File details

Details for the file llm_huggingface_plugin-0.1.tar.gz.

File metadata

Download URL: llm_huggingface_plugin-0.1.tar.gz
Upload date: Aug 14, 2025
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.19

File hashes

Hashes for llm_huggingface_plugin-0.1.tar.gz
Algorithm	Hash digest
SHA256	`9fbb7829a900f90812cf8594be096bd3e41d4844b61b7ff173c9f9bde4b6732f`
MD5	`f846577f863f5ea733823f68ae2b67f6`
BLAKE2b-256	`a216312068714cf337303f86cf27e56e2311fe700c77c00398bdf7f460908198`

See more details on using hashes here.

File details

Details for the file llm_huggingface_plugin-0.1-py3-none-any.whl.

File metadata

Download URL: llm_huggingface_plugin-0.1-py3-none-any.whl
Upload date: Aug 14, 2025
Size: 14.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.19

File hashes

Hashes for llm_huggingface_plugin-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6e12e8d16d908f457f8c2d7714b4da4cb34e4f6d8f8ee84f59c401f677624f4`
MD5	`b02ea64d5d64c181ad48eb885cc83a0b`
BLAKE2b-256	`e8c1f9dd9b6343fecb4b91b8ce6bd9021f9130305eeb3ed43ef9b93456dddb93`

See more details on using hashes here.

llm-huggingface-plugin 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

llm-huggingface

Installation

Configuration

Usage

Features

Streaming Responses

JSON Schema Output

Function Calling / Tools

Generation Parameters

Interactive Chat

Conversation History

Advanced Options

Token Limits

Response Metadata

Python API

Model Discovery

Limitations

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes