Skip to main content

A Python library for inference-time scaling LLMs

Project description

its-hub: A Python library for inference-time scaling

Tests codecov PyPI version

its_hub is a Python library for inference-time scaling of LLMs, focusing on mathematical reasoning tasks.

ITS Hub algorithms: Self-Consistency, Best-of-N, and Particle Filtering

📚 Documentation

For comprehensive documentation, including installation guides, tutorials, and API reference, visit:

https://ai-innovation.team/its_hub

Installation

its_hub provides a minimal core focused on algorithms, with optional language model implementations.

Core Installation (Algorithms Only)

For gateway integration - just algorithms and interfaces, minimal dependencies:

pip install its_hub

This includes:

  • ✓ Self-Consistency and Best-of-N algorithms
  • ✓ Abstract base classes (AbstractLanguageModel, AbstractOutcomeRewardModel)
  • ✓ Only 2 dependencies: numpy, typing-extensions

With Language Model Support

For standalone use - includes OpenAI-compatible language model implementation:

pip install its_hub[lm]

Adds: OpenAICompatibleLanguageModel, LLMJudge, StepGeneration (requires openai, aiohttp, backoff)

With Experimental Algorithms

For experimental features - includes beam search and particle filtering:

pip install its_hub[experimental]

Adds: Process reward models, beam search, particle filtering algorithms

Development Installation

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
cd its_hub
pip install -e ".[dev]"
# or using uv:
uv sync --extra dev

Quick Start

Example 1: Gateway Integration (Core Installation)

Installation required: pip install its_hub (core only, minimal dependencies)

Gateway integration requires implementing two interfaces: AbstractLanguageModel for LM calls and AbstractOrchestrator for managing parallel execution with concurrency control and rate limiting.

import asyncio

from its_hub import AbstractLanguageModel, AbstractOrchestrator, SelfConsistency

# Step 1: Implement AbstractLanguageModel with your gateway's LM client
class MyGatewayLM(AbstractLanguageModel):
    def __init__(self, gateway_client):
        self.client = gateway_client

    async def agenerate_single(self, messages, stop=None, **kwargs):
        response = await self.client.generate(messages, stop=stop, **kwargs)
        return {"role": "assistant", "content": response}

# Step 2: Implement AbstractOrchestrator for concurrency control
# (or use the built-in LMOrchestrator from its_hub[lm])
class MyGatewayOrchestrator(AbstractOrchestrator):
    async def agenerate(self, lm, messages_lst, **kwargs):
        # Manage parallel calls with your gateway's rate limits
        ...

async def main():
    lm = MyGatewayLM(your_gateway_client)
    orchestrator = MyGatewayOrchestrator()
    algorithm = SelfConsistency(orchestrator=orchestrator)
    result = await algorithm.ainfer(lm, "What is 2+2?", budget=5)
    print(result)  # {"role": "assistant", "content": "4", ...}

asyncio.run(main())

The AbstractOrchestrator is the central coordination point — it controls how algorithms fan out parallel LM calls, enforces rate limits, and provides structured error handling. See Orchestration for details.

Example 2: Standalone Use with OpenAI-Compatible LM

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import OpenAICompatibleLanguageModel, SelfConsistency

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

algorithm = SelfConsistency()
result = algorithm.infer(lm, "What is the capital of France?", budget=3)
print(result)  # Most common answer from 3 generations

# Close lm for resource cleanup
asyncio.run(lm.close())

Example 3: Best-of-N with LLM Judge

Installation required: pip install its_hub[lm]

import asyncio

from its_hub import BestOfN, LLMJudge, OpenAICompatibleLanguageModel

lm = OpenAICompatibleLanguageModel(
    endpoint="https://api.openai.com/v1",
    api_key="your-api-key",
    model_name="gpt-4o-mini",
)

judge = LLMJudge(lm=lm, fallback_score=5.0)
algorithm = BestOfN(orm=judge)
result = algorithm.infer(lm, "Write a sorting function", budget=5)
print(result)  # Best response as judged by LLM

# Close lm for resource cleanup
asyncio.run(lm.close())

Key Features

  • 🔬 Multiple Algorithms: Self-Consistency, Best-of-N, Beam Search (experimental), Particle Filtering (experimental)
  • 🚀 Gateway Integration: Clean abstractions (AbstractLanguageModel, AbstractOrchestrator) for easy integration with AI gateways
  • 🔄 Orchestration: AbstractOrchestrator provides structured concurrency, rate limiting, and error propagation for parallel LM calls — essential for production gateway deployments
  • 🧮 Math-Optimized: Built for mathematical reasoning tasks
  • Async-First: ainfer() is the primary method; infer() is a sync wrapper. Concurrent generation with limits and error handling
  • 🎯 Minimal Core: Only 2 dependencies (numpy, typing-extensions) for core install

Coding Agent Plugin

its-hub is available as a plugin for two coding agents, bringing inference-time scaling directly into your coding workflow.

Claude Code

Via org marketplace (recommended — includes all Red Hat AI plugins):

/plugin marketplace add Red-Hat-AI-Innovation-Team/plugins
/plugin install its-hub@Red-Hat-AI-Innovation-Team/plugins

Via this repo directly:

/plugin marketplace add Red-Hat-AI-Innovation-Team/its_hub
/plugin install its-hub@Red-Hat-AI-Innovation-Team/its_hub

From a local clone:

git clone https://github.com/Red-Hat-AI-Innovation-Team/its_hub.git
/plugin marketplace add /path/to/its_hub
Codex CLI
codex plugin marketplace add Red-Hat-AI-Innovation-Team/plugins

Then install the plugin from the marketplace. See .codex-plugin/INSTALL.md for manual installation.

After Installing

Invoke the setup-guide skill to configure your model endpoint and algorithm.

Skill Description
setup-guide Guided first-time configuration
inference-scaling Run inference-time scaling on a single prompt
batch-scaling Batch scaling from a JSONL/CSV/TXT file

For detailed documentation, visit: https://ai-innovation.team/its_hub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

its_hub-1.1.0.tar.gz (826.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

its_hub-1.1.0-py3-none-any.whl (50.1 kB view details)

Uploaded Python 3

File details

Details for the file its_hub-1.1.0.tar.gz.

File metadata

  • Download URL: its_hub-1.1.0.tar.gz
  • Upload date:
  • Size: 826.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for its_hub-1.1.0.tar.gz
Algorithm Hash digest
SHA256 11909f0d0b9a7559425f556dd176b5f63dfb8ff93fa146efba06a15756adb701
MD5 ff0bf72b670f2409eed302baa9052e10
BLAKE2b-256 ba664adfc1389cb0a92faa83c2fd7105741cf5424410ed1aafeeae3b72a4cb3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for its_hub-1.1.0.tar.gz:

Publisher: release.yaml on Red-Hat-AI-Innovation-Team/its_hub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file its_hub-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: its_hub-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for its_hub-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d8d65e2309ab0e759b48438839f71006a8ded65db81cc69292a48fdc2c4d1a06
MD5 134a6d83622f30aa0c662089228676b0
BLAKE2b-256 763327c8ae95613ef7353c1e1dfb1f75a005effbefd51644f70489e853830bea

See more details on using hashes here.

Provenance

The following attestation bundles were made for its_hub-1.1.0-py3-none-any.whl:

Publisher: release.yaml on Red-Hat-AI-Innovation-Team/its_hub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page