Skip to main content

LLM plugin to access Hugging Face models

Project description

llm-huggingface

PyPI Changelog Tests License

Access Hugging Face models via the Inference API

Installation

Install this plugin in the same environment as LLM.

llm install llm-huggingface

Configuration

Configure the plugin by setting your Hugging Face API token:

llm keys set huggingface
<paste key here>

You can also set the API key by assigning it to the environment variable HUGGINGFACE_TOKEN.

Usage

The plugin automatically discovers and registers all available text-generation models from Hugging Face. Run a model using the hf/ prefix:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct "Write a haiku about coding"

You can list all available Hugging Face models:

llm models | grep "^hf/"

Set a default model to avoid the -m option:

llm models default hf/mistralai/Mistral-7B-Instruct-v0.3
llm "Explain quantum computing in simple terms"

Features

Streaming Responses

The plugin supports streaming for real-time output:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct "Tell me a story" --stream

JSON Schema Output

Force structured JSON output using schemas:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
  --schema '{"type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer"}}}' \
  "Generate a person's profile"

Or use the simpler DSL format:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
  --schema 'name: str, age: int, hobbies: list[str]' \
  "Generate a person's profile with hobbies"

Note: JSON schema support varies by model. Some models may not fully support structured output or may produce inconsistent results. Models specifically fine-tuned for instruction following and structured output generation tend to perform better with schemas.

Function Calling / Tools

The plugin supports function calling for models that have this capability:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct \
  --tool calculate 'def calculate(expression: str) -> float: """Evaluate a mathematical expression"""' \
  "What is 15% of 240?"

Generation Parameters

Control generation with various parameters:

  • Temperature (0.0-2.0): Controls randomness

    llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o temperature 0.7 "Write creatively"
    
  • Top-p (0.0-1.0): Nucleus sampling threshold

    llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o top_p 0.9 "Generate text"
    
  • Max tokens: Limit response length

    llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_tokens 100 "Explain AI"
    
  • Stop sequences: Halt generation at specific strings

    llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o stop '[".", "!"]' "Generate until punctuation"
    

Interactive Chat

Start an interactive chat session:

llm chat -m hf/meta-llama/Llama-3.2-3B-Instruct

Conversation History

Continue previous conversations:

llm -m hf/meta-llama/Llama-3.2-3B-Instruct "What is Python?" -c
llm -c "What are its main uses?"

Advanced Options

Token Limits

The plugin supports both max_tokens (chat-completions style) and max_new_tokens (text-generation style) parameters:

# Using max_tokens (preferred)
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_tokens 500 "Tell a story"

# Using max_new_tokens (for compatibility)
llm -m hf/meta-llama/Llama-3.2-3B-Instruct -o max_new_tokens 500 "Tell a story"

Note: You cannot set both max_tokens and max_new_tokens simultaneously.

Response Metadata

Access detailed response metadata using the Python API:

import llm

model = llm.get_model("hf/meta-llama/Llama-3.2-3B-Instruct")
response = model.prompt("Hello")

# Access metadata
print(response.response_json)
# {'usage': {...}, 'model': '...', 'finish_reason': 'stop', ...}

Python API

Use the plugin programmatically:

import llm

# Get a model
model = llm.get_model("hf/meta-llama/Llama-3.2-3B-Instruct")

# Simple prompt
response = model.prompt("Explain machine learning")
print(response.text())

# With options
response = model.prompt(
    "Write a poem",
    temperature=0.9,
    max_tokens=200
)

# Streaming
for chunk in model.prompt("Tell me a story", stream=True):
    print(chunk, end="", flush=True)

# With system prompt
response = model.prompt(
    "Translate to French: Hello",
    system="You are a helpful translation assistant."
)

Model Discovery

The plugin automatically discovers models that support the text-generation task from Hugging Face. The discovery is cached to improve performance. Models are registered with the hf/ prefix to distinguish them from other LLM providers.

Limitations

  • Attachments: Image, audio, and file attachments are not currently supported
  • Model availability: Not all Hugging Face models may be accessible via the Inference API
  • Rate limits: Subject to Hugging Face API rate limits based on your account type

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-huggingface
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

License

Apache 2.0

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_huggingface_plugin-0.1.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_huggingface_plugin-0.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_huggingface_plugin-0.1.tar.gz.

File metadata

File hashes

Hashes for llm_huggingface_plugin-0.1.tar.gz
Algorithm Hash digest
SHA256 9fbb7829a900f90812cf8594be096bd3e41d4844b61b7ff173c9f9bde4b6732f
MD5 f846577f863f5ea733823f68ae2b67f6
BLAKE2b-256 a216312068714cf337303f86cf27e56e2311fe700c77c00398bdf7f460908198

See more details on using hashes here.

File details

Details for the file llm_huggingface_plugin-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_huggingface_plugin-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a6e12e8d16d908f457f8c2d7714b4da4cb34e4f6d8f8ee84f59c401f677624f4
MD5 b02ea64d5d64c181ad48eb885cc83a0b
BLAKE2b-256 e8c1f9dd9b6343fecb4b91b8ce6bd9021f9130305eeb3ed43ef9b93456dddb93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page