Skip to main content

Domain-agnostic client for Hugging Face Inference API

Project description

hf-inference-gateway

Foundational, domain-agnostic client for the Hugging Face Inference API.

Overview

hf-inference-gateway provides a lightweight, framework-agnostic interface for interacting with Hugging Face's Router endpoint (OpenAI-compatible) and standard inference APIs. It abstracts transport, retry logic, timeout handling, and strict JSON validation while remaining completely independent of any specific business domain.

Designed for rapid prototyping and scalable architecture, this module can be integrated into any application requiring structured LLM responses without coupling to external business logic.

Features

  • Domain-agnostic design: Zero hardcoded business rules or vertical-specific terminology
  • Configurable model routing: Support any Hugging Face model via dynamic model_id injection
  • OpenAI-compatible format: Native support for /chat/completions workflows
  • Automatic retry logic: Exponential backoff with configurable limits and smart error filtering
  • Strict response validation: Optional Pydantic schema enforcement on model outputs
  • Timeout management: Configurable request deadlines with predictable failure modes
  • Connection pooling: Persistent HTTP client for efficient request handling

Installation

Stable releases are published to PyPI. Development versions can be installed directly from the repository.

# Install from PyPI (once published)
pip install hf-inference-gateway

# Install from source for development
pip install git+https://github.com/fm-byteshift-software-core/hf-inference-gateway.git

Requires Python 3.10 or higher.

Quick Start

import os
from hf_inference_gateway import HuggingFaceGateway, GatewayConfig

# Initialize configuration
config = GatewayConfig(
    api_token=os.getenv("HF_API_TOKEN"),
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    base_url="https://router.huggingface.co/v1",
    timeout=30.0,
    max_retries=2
)

# Instantiate gateway
gateway = HuggingFaceGateway(config)

# Execute inference with arbitrary context
result = gateway.execute_inference(
    message="Where is my order?",
    context={
        "status": "in_transit",
        "eta_minutes": 12,
        "attempt_count": 1
    },
    system_prompt="You are a support assistant. Return a JSON object with intent, sentiment, and response fields.",
    response_schema=None  # Optional Pydantic model for validation
)

print(result.payload)
print(result.latency_ms)

Configuration

Parameter Type Default Description
api_token str Required Hugging Face API authentication token
model_id str Required Hugging Face model identifier
timeout float 30.0 Maximum request duration in seconds
max_retries int 3 Number of retry attempts on transient failures
base_url str https://router.huggingface.co/v1 Inference API endpoint (OpenAI-compatible)

Error Handling

The module raises specific exceptions to enable predictable fallback strategies:

  • InferenceGatewayError: Base exception for all gateway-related failures
  • ConfigurationError: Raised when initialization parameters are invalid
  • APIError: Raised on HTTP failures (status codes, network errors, rate limits)
  • ParsingError: Raised when the model response cannot be parsed as valid JSON

Development

# Clone repository
git clone https://github.com/fm-byteshift-software-core/hf-inference-gateway.git
cd hf-inference-gateway

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint and format
ruff check .
ruff format .

License

MIT License. See LICENSE for details.


Maintained By

This project is developed and maintained by FM ByteShift Software

Fernando Magalhães
CEO – FM ByteShift Software
contact@fmbyteshiftsoftware.com
fmbyteshiftsoftware.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hf_inference_gateway-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hf_inference_gateway-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file hf_inference_gateway-0.1.0.tar.gz.

File metadata

  • Download URL: hf_inference_gateway-0.1.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for hf_inference_gateway-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fc4e703f1787c9fbec8e76ddd6392e3cad2af1d773500cce08c31d948a6c3d53
MD5 57ec9c47f8c18b39e5e8b27324e5d34b
BLAKE2b-256 63519fe3da3001daac9282b2724b91c99b2d90d60f81acb6bbb522d098b78ea4

See more details on using hashes here.

File details

Details for the file hf_inference_gateway-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hf_inference_gateway-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9bedd82318db5ff1e4c6a86794285760e240ae9fa237407ee3a5a34bed0ea14
MD5 f9eea1545811a61bb1c019baafa20e20
BLAKE2b-256 11fa7cc1f5d54d037a748cfadf1bd29944495458bfc9357dfcc5cd42fe05cd49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page