LLM plugin to access Hugging Face models
Project description
llm-huggingface
Access Hugging Face models via the Inference API
Installation
Install this plugin in the same environment as LLM.
llm install llm-huggingface
Configuration
Configure the plugin by setting your Hugging Face API token:
llm keys set huggingface
<paste key here>
You can also set the API key by assigning it to the environment variable HUGGINGFACE_TOKEN.
Usage
The plugin automatically discovers and registers all available text-generation models from Hugging Face. Run a model using the hf/ prefix:
llm -m hf/meta-llama/Llama-3.2-3B-Instruct "Write a haiku about coding"
You can list all available Hugging Face models:
llm models | grep "^hf/"
Set a default model to avoid the -m option:
llm models default hf/mistralai/Mistral-7B-Instruct-v0.3
llm "Explain quantum computing in simple terms"
Features
Streaming Responses
The plugin supports streaming for real-time output:
llm -m hf/meta-llama/Llama-3.2-3B-Instruct "Tell me a story" --stream
JSON Schema Output
Force structured JSON output using schemas:
llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
--schema '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}}' \
"Generate a person's profile"
Or use the simpler DSL format:
llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
--schema 'name: str, age: int, hobbies: list[str]' \
"Generate a person's profile with hobbies"
Note: JSON schema support varies by model. Some models may not fully support structured output or may produce inconsistent results. Models specifically fine-tuned for instruction following and structured output generation tend to perform better with schemas.
Function Calling / Tools
The plugin supports function calling for models that have this capability:
llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
--tool calculate 'def calculate(expression: str) -> float: """Evaluate a mathematical expression"""' \
"What is 15% of 240?"
Generation Parameters
Control generation with various parameters:
-
Temperature (0.0-2.0): Controls randomness
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o temperature 0.7 "Write creatively"
-
Top-p (0.0-1.0): Nucleus sampling threshold
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o top_p 0.9 "Generate text"
-
Max tokens: Limit response length
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_tokens 100 "Explain AI"
-
Stop sequences: Halt generation at specific strings
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o stop '[".", "!"]' "Generate until punctuation"
Interactive Chat
Start an interactive chat session:
llm chat -m hf/meta-llama/Llama-3.2-3B-Instruct
Conversation History
Continue previous conversations:
llm -m hf/meta-llama/Llama-3.2-3B-Instruct "What is Python?" -c
llm -c "What are its main uses?"
Advanced Options
Token Limits
The plugin supports both max_tokens (chat-completions style) and max_new_tokens (text-generation style) parameters:
# Using max_tokens (preferred)
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_tokens 500 "Tell a story"
# Using max_new_tokens (for compatibility)
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_new_tokens 500 "Tell a story"
Note: You cannot set both max_tokens and max_new_tokens simultaneously.
Response Metadata
Access detailed response metadata using the Python API:
import llm
model = llm.get_model("hf/meta-llama/Llama-3.2-3B-Instruct")
response = model.prompt("Hello")
# Access metadata
print(response.response_json)
# {'usage': {...}, 'model': '...', 'finish_reason': 'stop', ...}
Python API
Use the plugin programmatically:
import llm
# Get a model
model = llm.get_model("hf/meta-llama/Llama-3.2-3B-Instruct")
# Simple prompt
response = model.prompt("Explain machine learning")
print(response.text())
# With options
response = model.prompt(
"Write a poem",
temperature=0.9,
max_tokens=200
)
# Streaming
for chunk in model.prompt("Tell me a story", stream=True):
print(chunk, end="", flush=True)
# With system prompt
response = model.prompt(
"Translate to French: Hello",
system="You are a helpful translation assistant."
)
Model Discovery
The plugin automatically discovers models that support the text-generation task from Hugging Face. The discovery is cached to improve performance. Models are registered with the hf/ prefix to distinguish them from other LLM providers.
Limitations
- Attachments: Image, audio, and file attachments are not currently supported
- Model availability: Not all Hugging Face models may be accessible via the Inference API
- Rate limits: Subject to Hugging Face API rate limits based on your account type
Development
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-huggingface
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
pip install -e '.[test]'
To run the tests:
pytest
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_huggingface_plugin-0.1.tar.gz.
File metadata
- Download URL: llm_huggingface_plugin-0.1.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fbb7829a900f90812cf8594be096bd3e41d4844b61b7ff173c9f9bde4b6732f
|
|
| MD5 |
f846577f863f5ea733823f68ae2b67f6
|
|
| BLAKE2b-256 |
a216312068714cf337303f86cf27e56e2311fe700c77c00398bdf7f460908198
|
File details
Details for the file llm_huggingface_plugin-0.1-py3-none-any.whl.
File metadata
- Download URL: llm_huggingface_plugin-0.1-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6e12e8d16d908f457f8c2d7714b4da4cb34e4f6d8f8ee84f59c401f677624f4
|
|
| MD5 |
b02ea64d5d64c181ad48eb885cc83a0b
|
|
| BLAKE2b-256 |
e8c1f9dd9b6343fecb4b91b8ce6bd9021f9130305eeb3ed43ef9b93456dddb93
|