Domain-agnostic client for Hugging Face Inference API
Project description
hf-inference-gateway
Foundational, domain-agnostic client for the Hugging Face Inference API.
Overview
hf-inference-gateway provides a lightweight, framework-agnostic interface for interacting with Hugging Face's Router endpoint (OpenAI-compatible) and standard inference APIs. It abstracts transport, retry logic, timeout handling, and strict JSON validation while remaining completely independent of any specific business domain.
Designed for rapid prototyping and scalable architecture, this module can be integrated into any application requiring structured LLM responses without coupling to external business logic.
Features
- Domain-agnostic design: Zero hardcoded business rules or vertical-specific terminology
- Configurable model routing: Support any Hugging Face model via dynamic
model_idinjection - OpenAI-compatible format: Native support for
/chat/completionsworkflows - Automatic retry logic: Exponential backoff with configurable limits and smart error filtering
- Strict response validation: Optional Pydantic schema enforcement on model outputs
- Timeout management: Configurable request deadlines with predictable failure modes
- Connection pooling: Persistent HTTP client for efficient request handling
Installation
Stable releases are published to PyPI. Development versions can be installed directly from the repository.
# Install from PyPI (once published)
pip install hf-inference-gateway
# Install from source for development
pip install git+https://github.com/fm-byteshift-software-core/hf-inference-gateway.git
Requires Python 3.10 or higher.
Quick Start
import os
from hf_inference_gateway import HuggingFaceGateway, GatewayConfig
# Initialize configuration
config = GatewayConfig(
api_token=os.getenv("HF_API_TOKEN"),
model_id="meta-llama/Llama-3.1-8B-Instruct",
base_url="https://router.huggingface.co/v1",
timeout=30.0,
max_retries=2
)
# Instantiate gateway
gateway = HuggingFaceGateway(config)
# Execute inference with arbitrary context
result = gateway.execute_inference(
message="Where is my order?",
context={
"status": "in_transit",
"eta_minutes": 12,
"attempt_count": 1
},
system_prompt="You are a support assistant. Return a JSON object with intent, sentiment, and response fields.",
response_schema=None # Optional Pydantic model for validation
)
print(result.payload)
print(result.latency_ms)
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
api_token |
str |
Required | Hugging Face API authentication token |
model_id |
str |
Required | Hugging Face model identifier |
timeout |
float |
30.0 |
Maximum request duration in seconds |
max_retries |
int |
3 |
Number of retry attempts on transient failures |
base_url |
str |
https://router.huggingface.co/v1 |
Inference API endpoint (OpenAI-compatible) |
Error Handling
The module raises specific exceptions to enable predictable fallback strategies:
InferenceGatewayError: Base exception for all gateway-related failuresConfigurationError: Raised when initialization parameters are invalidAPIError: Raised on HTTP failures (status codes, network errors, rate limits)ParsingError: Raised when the model response cannot be parsed as valid JSON
Development
# Clone repository
git clone https://github.com/fm-byteshift-software-core/hf-inference-gateway.git
cd hf-inference-gateway
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Lint and format
ruff check .
ruff format .
License
MIT License. See LICENSE for details.
Maintained By
This project is developed and maintained by FM ByteShift Software
Fernando Magalhães
CEO – FM ByteShift Software
contact@fmbyteshiftsoftware.com
fmbyteshiftsoftware.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hf_inference_gateway-0.1.0.tar.gz.
File metadata
- Download URL: hf_inference_gateway-0.1.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc4e703f1787c9fbec8e76ddd6392e3cad2af1d773500cce08c31d948a6c3d53
|
|
| MD5 |
57ec9c47f8c18b39e5e8b27324e5d34b
|
|
| BLAKE2b-256 |
63519fe3da3001daac9282b2724b91c99b2d90d60f81acb6bbb522d098b78ea4
|
File details
Details for the file hf_inference_gateway-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hf_inference_gateway-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9bedd82318db5ff1e4c6a86794285760e240ae9fa237407ee3a5a34bed0ea14
|
|
| MD5 |
f9eea1545811a61bb1c019baafa20e20
|
|
| BLAKE2b-256 |
11fa7cc1f5d54d037a748cfadf1bd29944495458bfc9357dfcc5cd42fe05cd49
|