High performance LLM client

These details have not been verified by PyPI

Project links

Homepage

Project description

Bhumi Logo

🌍 BHUMI v0.4.82 - The Fastest AI Inference Client ⚡

Introduction

Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.

Why Bhumi?

🚀 Fastest AI inference client – Outperforms alternatives with 2-3x higher throughput
⚡ Built with Rust for Python – Achieves high efficiency with low overhead
🌐 Supports 9+ AI providers – OpenAI, Anthropic, Google Gemini, Groq, Cerebras, SambaNova, Mistral, Cohere, and more
👁️ Vision capabilities – Image analysis across 5 providers (OpenAI, Anthropic, Gemini, Mistral, Cerebras)
🔄 Streaming and async capabilities – Real-time responses with Rust-powered concurrency
🔁 Automatic connection pooling and retries – Ensures reliability and efficiency
💡 Minimal memory footprint – Uses up to 60% less memory than other clients
🏗 Production-ready – Optimized for high-throughput applications with OpenAI Responses API support

Bhumi (भूमि) is Sanskrit for Earth, symbolizing stability, grounding, and speed—just like our inference engine, which ensures rapid and stable performance. 🚀

🆕 What's New in v0.4.82

✨ Major New Features

🔷 Cohere Provider Support: Added Cohere AI with OpenAI-compatible /v1/chat/completions endpoint
📡 Free-Threaded Python 3.13+ Support: True parallel execution without GIL for maximum performance
🗑️ Removed orjson Dependency: Simplified dependencies using stdlib JSON for better compatibility
⬆️ PyO3 0.26 Upgrade: Updated to latest PyO3 with modern Bound API and better performance
🔧 Tokio 1.47: Latest async runtime for improved concurrency

🛠 Technical Improvements

Enhanced OCR Integration: client.ocr() and client.upload_file() methods
Unified API: Single method handles both file upload and OCR processing
Better Error Handling: Improved timeout and validation for OCR operations
Production Ready: Optimized for high-volume document processing workflows

📊 OCR Capabilities

Document Types: PDF, JPEG, PNG, and more formats
Text Extraction: High-accuracy OCR with layout preservation
Structured Data: Extract tables, forms, and key-value pairs
Bounding Boxes: Precise text positioning and element detection
Multi-format Output: Markdown text + structured JSON data

🆕 What's New in v0.4.8

✨ Major New Features

🌐 8+ AI Providers: Added Mistral AI support with vision capabilities (Pixtral models)
👁️ Vision Support: Image analysis across 5 providers (OpenAI, Anthropic, Gemini, Mistral, Cerebras)
📡 OpenAI Responses API: Intelligent routing for new API patterns with better performance
🔧 Satya v0.3.7: Upgraded with nested model support and enhanced validation
🚀 Production Ready: Improved wheel building, Docker compatibility, and CI/CD

🛠 Technical Improvements

Cross-platform Wheels: Enhanced building for Linux, macOS (Intel + Apple Silicon), Windows
OpenSSL Integration: Proper SSL library linking for all platforms
Workflow Optimization: Disabled integration tests for faster releases
Bug Fixes: Resolved MAP-Elites buffer issues and Satya validation problems
Performance Optimizations: Improved MAP-Elites archive loading with orjson + Satya validation
Production Ready: Enhanced error handling and timeout protection

📊 Provider Support Matrix

Provider	Chat	Streaming	Tools	Vision	Structured
OpenAI	✅	✅	✅	✅	✅
Anthropic	✅	✅	✅	✅	⚠️
Gemini	✅	✅	✅	✅	⚠️
Groq	✅	✅	✅	❌	⚠️
Cerebras	✅	✅	✅*	✅	⚠️
SambaNova	✅	✅	✅	❌	⚠️
OpenRouter	✅	✅	✅	❌	⚠️
Cohere	✅	✅	✅	❌	⚠️

*Cerebras tools require specific models

Installation

No Rust compiler required! 🎊 Pre-compiled wheels are available for all major platforms:

pip install bhumi

Supported Platforms:

🐧 Linux (x86_64)
🍎 macOS (Intel & Apple Silicon)
🪟 Windows (x86_64)
🐍 Python 3.8, 3.9, 3.10, 3.11, 3.12

Latest v0.4.8 release includes improved wheel building and cross-platform compatibility!

Quick Start

OpenAI Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("OPENAI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="openai/gpt-4o",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

⚡ Performance Optimizations

Bhumi includes cutting-edge performance optimizations that make it 2-3x faster than alternatives:

🧠 MAP-Elites Buffer Strategy (v0.4.8 Enhanced)

Ultra-fast archive loading with Satya v0.3.7 validation + stdlib JSON parsing (2-3x faster than standard JSON)
Trained buffer configurations optimized through evolutionary algorithms
Automatic buffer adjustment based on response patterns and historical data
Type-safe validation with comprehensive error checking
Secure loading without unsafe eval() operations
Nested model support for complex data structures

📊 Performance Status Check

Check if you have optimal performance with the built-in diagnostics:

from bhumi.utils import print_performance_status

# Check optimization status
print_performance_status()
# 🚀 Bhumi Performance Status
# ✅ Optimized MAP-Elites archive loaded  
# ⚡ Optimization Details:
#    • Entries: 15,644 total, 15,644 optimized
#    • Coverage: 100.0% of search space
#    • Loading: Satya validation + stdlib JSON parsing (2-3x faster)

🏆 Archive Distribution (v0.4.8 Enhanced)

When you install Bhumi, you automatically get:

Pre-trained MAP-Elites archive for optimal buffer sizing
Fast stdlib JSON parsing (2-3x faster than standard json)
Satya v0.3.7-powered type validation for bulletproof data loading
Performance metrics and diagnostics
Nested model support for complex configurations

Gemini Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="gemini/gemini-2.0-flash",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Cerebras Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("CEREBRAS_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="cerebras/llama3.1-8b",  # gateway-style model parsing is supported
        debug=True,
    )

    client = BaseLLMClient(config)

    response = await client.completion([
        {"role": "user", "content": "Summarize the benefits of Bhumi in one sentence."}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Mistral AI Example (with Vision)

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("MISTRAL_API_KEY")

async def main():
    # Text-only model
    config = LLMConfig(
        api_key=api_key,
        model="mistral/mistral-small-latest",
        debug=True
    )
    
    client = BaseLLMClient(config)
    response = await client.completion([
        {"role": "user", "content": "Bonjour! Parlez-moi de Paris."}  # French language support
    ])
    print(f"Mistral Response: {response['text']}")

    # Vision model for image analysis
    vision_config = LLMConfig(
        api_key=api_key,
        model="mistral/pixtral-12b-2409"  # Pixtral vision model
    )
    
    vision_client = BaseLLMClient(vision_config)
    response = await vision_client.completion([
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="}}
            ]
        }
    ])
    print(f"Vision Analysis: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Provider API: Multi-Provider Model Format

Bhumi unifies providers using a simple provider/model format in LLMConfig.model. Base URLs are auto-set for known providers; you can override with base_url.

Supported providers: openai, anthropic, gemini, groq, sambanova, openrouter, cerebras, mistral, cohere
Foundation providers use provider/model. Gateways like Groq/OpenRouter/SambaNova may use nested paths after the provider (e.g., openrouter/meta-llama/llama-3.1-8b-instruct).

from bhumi.base_client import BaseLLMClient, LLMConfig

# OpenAI
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o"))

# Anthropic
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-5-sonnet-latest"))

# Gemini (OpenAI-compatible endpoint)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-2.0-flash"))

# Groq (gateway) – nested path after provider is kept intact
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama-3.1-8b-instant"))

# Cerebras (gateway)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("CEREBRAS_API_KEY"), model="cerebras/llama3.1-8b", base_url="https://api.cerebras.ai/v1"))

# SambaNova (gateway)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("SAMBANOVA_API_KEY"), model="sambanova/Meta-Llama-3.1-405B-Instruct"))

# OpenRouter (gateway)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/meta-llama/llama-3.1-8b-instruct"))

# Mistral AI
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mistral-small-latest"))

# OpenRouter (gateway)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/meta-llama/llama-3.1-8b-instruct"))

# Mistral AI
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mistral-small-latest"))

# Cohere (OpenAI-compatible)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("COHERE_API_KEY"), model="cohere/command-a-03-2025"))

🎯 Provider-Specific Model Access

Bhumi supports accessing specialized models beyond the basic ones. Here's how to access different model variants and specialized capabilities:

OpenAI Models

# GPT-4 Family
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o"))              # Latest GPT-4 Optimized
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o-mini"))         # Fast GPT-4 variant
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4-turbo"))          # Turbo variant
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4"))                # Original GPT-4

# GPT-3.5 Family
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-3.5-turbo"))       # Latest Turbo

# Specialized Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4-vision-preview")) # Vision-capable GPT-4
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4-0125-preview"))   # Specific version

# Responses API (New) - Automatically uses Responses API
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o"))
response = await client.parse(input="Analyze this data", text_format=MyModel)  # Uses /responses endpoint

Anthropic Models

# Claude 3.5 Family
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-5-sonnet-latest"))  # Latest Sonnet
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-5-sonnet-20241022")) # Specific version

# Claude 3 Family
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-opus-latest"))       # Most capable
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-sonnet-20240229"))  # Balanced
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-haiku-20240307"))   # Fastest

# Claude 2 & Earlier
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-2.1"))                # Claude 2.1
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-instant-1.2"))        # Fast variant

Google Gemini Models

# Gemini 1.5 Family (Latest)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-1.5-pro-latest"))     # Most capable
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-1.5-flash-latest"))   # Fast variant
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-1.5-pro-001"))        # Specific version

# Gemini 1.0 Family
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-pro"))                # Text-only
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-pro-vision"))         # Vision-capable

# Experimental Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-exp-1114"))           # Experimental

Mistral AI Models

# Large Language Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mistral-large-latest"))     # Most capable
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mistral-medium-latest"))   # Balanced
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mistral-small-latest"))    # Fast

# Code-Specific Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/codestral-latest"))         # Code generation
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/codestral-2405"))           # Specific version

# Vision Models (Pixtral)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/pixtral-12b-2409"))         # Vision analysis
client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/pixtral-large-latest"))     # Large vision model

### Cohere Models
```python
# Command A Family (Latest)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("COHERE_API_KEY"), model="cohere/command-a-03-2025"))       # Most capable
client = BaseLLMClient(LLMConfig(api_key=os.getenv("COHERE_API_KEY"), model="cohere/command-r-plus-08-2025"))   # Large model
client = BaseLLMClient(LLMConfig(api_key=os.getenv("COHERE_API_KEY"), model="cohere/command-r-08-2024"))        # Medium model

# Command R+ Family
client = BaseLLMClient(LLMConfig(api_key=os.getenv("COHERE_API_KEY"), model="cohere/command-r-plus"))             # Plus variant
client = BaseLLMClient(LLMConfig(api_key=os.getenv("COHERE_API_KEY"), model="cohere/command-r"))                  # Base variant

🔍 Mistral OCR & Document Analysis

Mistral's Pixtral models excel at OCR (Optical Character Recognition) and document analysis, making them perfect for:

Text extraction from images and documents
Document processing (invoices, receipts, forms)
Handwriting recognition
Multilingual text extraction (200+ languages)
Table and layout analysis

# OCR with Pixtral
vision_client = BaseLLMClient(LLMConfig(
    api_key=os.getenv("MISTRAL_API_KEY"), 
    model="mistral/pixtral-12b-2409"  # OCR specialist
))

# Extract text from receipt
receipt_response = await vision_client.completion([
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Extract all text from this receipt:"},
            {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
        ]
    }
])
print(f"OCR Result: {receipt_response['text']}")

# Analyze document layout
doc_response = await vision_client.completion([
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this document and extract key information:"},
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
        ]
    }
])

OCR Capabilities:

High accuracy text extraction
Multi-language support including handwriting
Table recognition and structured data extraction
Form processing with field detection
Mathematical notation recognition
Document classification by type

🔍 Dedicated OCR API (Mistral)

Bhumi now supports Mistral's dedicated OCR API for high-performance document processing with structured data extraction:

Two OCR Workflows

# Workflow 1: Direct file upload + OCR (Recommended)
result = await client.ocr(
    file_path="/path/to/document.pdf",
    pages=[0, 1],  # Process specific pages
    model="mistral-ocr-latest"
)

# Workflow 2: Pre-uploaded file
upload_result = await client.upload_file("/path/to/document.pdf")
result = await client.ocr(
    document={"type": "file", "file_id": upload_result["id"]},
    pages=[0, 1]
)

OCR with Structured Output

# Define extraction schema
faq_schema = {
    "type": "text",
    "json_schema": {
        "name": "document_analysis",
        "description": "Extract key information from document",
        "schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "topics": {"type": "array", "items": {"type": "string"}},
                "key_points": {"type": "array", "items": {"type": "string"}}
            }
        }
    }
}

# OCR with structured extraction
result = await client.ocr(
    file_path="/path/to/faq.pdf",
    pages=[0, 1],
    document_annotation_format=faq_schema,
    bbox_annotation_format=bbox_schema  # Optional: extract bounding boxes
)

# Access results
extracted_text = result["pages"][0]["markdown"]
structured_data = result["document_annotation"]
pages_processed = result["usage_info"]["pages_processed"]

OCR Features

📄 Multi-format Support: PDF, JPEG, PNG, and more
📑 Multi-page Processing: Process specific pages or entire documents
🏗️ Structured Extraction: Extract structured data with JSON schemas
📊 Bounding Box Analysis: Get precise text positioning
🌐 Multi-language: Support for 200+ languages
📈 High Performance: Dedicated OCR models optimized for accuracy
🔄 Dual Workflows: Direct upload or pre-uploaded file processing

OCR Response Format

{
    "pages": [
        {
            "index": 0,
            "markdown": "Extracted text content...",
            "images": [],
            "dimensions": {"dpi": 200, "height": 2200, "width": 1700}
        }
    ],
    "model": "mistral-ocr-2505-completion",
    "document_annotation": "Structured analysis...",
    "usage_info": {
        "pages_processed": 2,
        "doc_size_bytes": 1084515
    }
}

Specialized Models

client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mathstral-7b-v0.1")) # Math specialist client = BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/mistral-embed")) # Embeddings


### Groq Models (Gateway)
```python
# Meta Llama Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama-3.1-405b-instruct"))      # Largest Llama
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama-3.1-70b-instruct"))       # 70B variant
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama-3.1-8b-instruct"))        # 8B variant

# Meta Llama 3 Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama3-70b-8192"))               # Llama 3 70B
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama3-8b-8192"))                # Llama 3 8B

# Other Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/mixtral-8x7b-32768"))            # Mixtral 8x7B
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/gemma-7b-it"))                   # Google Gemma

Cerebras Models (Gateway)

# Llama Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("CEREBRAS_API_KEY"), model="cerebras/llama3.1-70b"))           # 70B model
client = BaseLLMClient(LLMConfig(api_key=os.getenv("CEREBRAS_API_KEY"), model="cerebras/llama3.1-8b"))            # 8B model

# Specialized Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("CEREBRAS_API_KEY"), model="cerebras/llama3.1-8b-instruct"))   # Instruction-tuned

SambaNova Models (Gateway)

# Llama Models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("SAMBANOVA_API_KEY"), model="sambanova/Meta-Llama-3.1-405B-Instruct"))  # Largest
client = BaseLLMClient(LLMConfig(api_key=os.getenv("SAMBANOVA_API_KEY"), model="sambanova/Meta-Llama-3.1-70B-Instruct"))   # 70B
client = BaseLLMClient(LLMConfig(api_key=os.getenv("SAMBANOVA_API_KEY"), model="sambanova/Meta-Llama-3.1-8B-Instruct"))    # 8B

# E5 Embeddings
client = BaseLLMClient(LLMConfig(api_key=os.getenv("SAMBANOVA_API_KEY"), model="sambanova/e5-mistral-7b-instruct"))        # Embedding model

OpenRouter Models (Gateway)

# Access any model via OpenRouter
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/meta-llama/llama-3.1-405b-instruct"))
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/anthropic/claude-3.5-sonnet"))
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/google/gemini-pro-1.5"))
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/mistralai/mistral-large"))

# Specialized models
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/anthropic/claude-3-haiku:beta")) # Beta models

🔍 Provider Walkthroughs

OpenAI Walkthrough

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig

async def openai_walkthrough():
    client = BaseLLMClient(LLMConfig(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="openai/gpt-4o",
        debug=True
    ))

    # 1. Basic Chat
    response = await client.completion([
        {"role": "user", "content": "Hello!"}
    ])
    print(f"Chat: {response['text']}")

    # 2. Vision Analysis
    vision_response = await client.completion([
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
            ]
        }
    ])
    print(f"Vision: {vision_response['text']}")

    # 3. Structured Output (Responses API)
    from satya import Model, Field
    class Person(Model):
        name: str
        age: int

    parsed = await client.parse(
        input="Create a person named Alice, age 30",
        text_format=Person
    )
    print(f"Parsed: {parsed.parsed.name}, {parsed.parsed.age}")

    # 4. Streaming
    async for chunk in await client.completion([
        {"role": "user", "content": "Write a short story"}
    ], stream=True):
        print(chunk, end="", flush=True)

asyncio.run(openai_walkthrough())

Mistral AI Walkthrough

async def mistral_walkthrough():
    # Text Model
    text_client = BaseLLMClient(LLMConfig(
        api_key=os.getenv("MISTRAL_API_KEY"),
        model="mistral/mistral-small-latest"
    ))

    # French language support
    response = await text_client.completion([
        {"role": "user", "content": "Bonjour! Comment allez-vous?"}
    ])
    print(f"French: {response['text']}")

    # Vision Model
    vision_client = BaseLLMClient(LLMConfig(
        api_key=os.getenv("MISTRAL_API_KEY"),
        model="mistral/pixtral-12b-2409"
    ))

    vision_response = await vision_client.completion([
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Analyze this image:"},
                {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
            ]
        }
    ])
    print(f"Vision Analysis: {vision_response['text']}")

asyncio.run(mistral_walkthrough())

Anthropic Walkthrough

async def anthropic_walkthrough():
    client = BaseLLMClient(LLMConfig(
        api_key=os.getenv("ANTHROPIC_API_KEY"),
        model="anthropic/claude-3-5-sonnet-latest"
    ))

    # Long context and reasoning
    response = await client.completion([
        {
            "role": "user",
            "content": "Analyze this complex problem and provide a detailed solution..."
        }
    ], max_tokens=4000)
    print(f"Analysis: {response['text']}")

    # Tool use
    def calculate(x: int, y: int) -> int:
        return x + y

    client.register_tool("calculate", calculate, "Add two numbers", {
        "type": "object",
        "properties": {"x": {"type": "integer"}, "y": {"type": "integer"}},
        "required": ["x", "y"]
    })

    tool_response = await client.completion([
        {"role": "user", "content": "What is 15 + 27?"}
    ])
    print(f"Tool result: {tool_response['text']}")

asyncio.run(anthropic_walkthrough())

🛠 Advanced Model Selection

Choosing the Right Model

# For Speed (Fast inference, lower cost)
fast_client = BaseLLMClient(LLMConfig(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="openai/gpt-4o-mini"  # Fast variant
))

# For Quality (Best capabilities, higher cost)
quality_client = BaseLLMClient(LLMConfig(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="openai/gpt-4o"  # Most capable
))

# For Vision Tasks
vision_client = BaseLLMClient(LLMConfig(
    api_key=os.getenv("MISTRAL_API_KEY"),
    model="mistral/pixtral-12b-2409"  # Specialized vision model
))

# For Code Generation
code_client = BaseLLMClient(LLMConfig(
    api_key=os.getenv("MISTRAL_API_KEY"),
    model="mistral/codestral-latest"  # Code specialist
))

# For Math/Reasoning
math_client = BaseLLMClient(LLMConfig(
    api_key=os.getenv("MISTRAL_API_KEY"),
    model="mistral/mathstral-7b-v0.1"  # Math specialist
))

Model Switching at Runtime

from bhumi.base_client import BaseLLMClient, LLMConfig

class MultiModelClient:
    def __init__(self):
        self.clients = {
            'fast': BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama-3.1-8b-instruct")),
            'quality': BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o")),
            'vision': BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/pixtral-12b-2409")),
            'code': BaseLLMClient(LLMConfig(api_key=os.getenv("MISTRAL_API_KEY"), model="mistral/codestral-latest"))
        }

    async def query(self, task_type: str, prompt: str):
        client = self.clients.get(task_type, self.clients['fast'])
        response = await client.completion([{"role": "user", "content": prompt}])
        return response['text']

# Usage
multi_client = MultiModelClient()

# Fast response
fast_answer = await multi_client.query('fast', 'Quick question?')

# High-quality response
quality_answer = await multi_client.query('quality', 'Complex analysis needed')

# Vision task
vision_answer = await multi_client.query('vision', 'Analyze this image...')

# Code generation
code_answer = await multi_client.query('code', 'Write a Python function...')

Tool Use (Function Calling)

Bhumi supports OpenAI-style function calling and Gemini function declarations. Register Python callables with JSON schemas; Bhumi will add them to requests and execute tool calls automatically.

import os, asyncio, json
from bhumi.base_client import BaseLLMClient, LLMConfig

# 1) Define a tool
def get_weather(location: str, unit: str = "celsius"):
    return {"location": location, "unit": unit, "forecast": "sunny", "temp": 27}

tool_schema = {
    "type": "object",
    "properties": {
        "location": {"type": "string", "description": "City and country"},
        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
    },
    "required": ["location"]
}

async def main():
    client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o", debug=True))
    client.register_tool("get_weather", get_weather, "Get the current weather", tool_schema)

    # 2) Ask a question that should trigger a tool call
    resp = await client.completion([
        {"role": "user", "content": "What's the weather in Tokyo in celsius?"}
    ])

    print(resp["text"])  # Tool is executed and response incorporates tool output

asyncio.run(main())

Notes:

OpenAI-compatible providers use tools with tool_calls in responses; Gemini uses function_declarations and tool_config under the hood.
Bhumi parses tool calls, executes your Python function, appends a tool message, and continues the conversation automatically.

🚀 Structured Outputs with Satya v0.3.7 High-Performance Validation

Bhumi uses Satya v0.3.7 for structured outputs, providing 2-7x faster validation than alternatives with OpenAI Responses API compatibility.

Satya v0.3.7 Integration

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
from satya import Model, Field

class UserProfile(Model):
    """High-performance user profile with Satya validation"""
    name: str = Field(description="User's full name")
    age: int = Field(description="User's age", ge=13, le=120)
    email: str = Field(description="Email address", email=True)  # RFC 5322 validation

async def main():
    client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o"))

    # Use parse() method similar to OpenAI's client.chat.completions.parse()
    completion = await client.parse(
        messages=[{"role": "user", "content": "Create user Alice, age 25"}],
        response_format=UserProfile,  # Satya model for high performance
        timeout=15.0  # Built-in timeout protection
    )

    user = completion.parsed  # Already validated with 2-7x performance boost!
    print(f"User: {user.name}, Age: {user.age}, Email: {user.email}")

asyncio.run(main())

OpenAI Responses API Support

# New Responses API patterns with intelligent routing
# OpenAI automatically uses Responses API when input= or instructions= provided

# Pattern 1: Simple input
completion = await client.parse(
    input="Create a user profile for Bob, age 30",
    text_format=UserProfile
)

# Pattern 2: Separated instructions
completion = await client.parse(
    instructions="Create a detailed user profile",
    input="Name: Sarah, Age: 28, Email: sarah@example.com",
    text_format=UserProfile
)

# Pattern 3: Streaming with Responses API
async for chunk in await client.parse(
    input="Write a story about AI",
    text_format=StoryModel,
    stream=True
):
    print(chunk.delta, end="", flush=True)

Key Features

Satya v0.3.7: Built-in OpenAI-compatible schema generation with nested model support
2-7x Performance: Faster than alternative validation libraries
RFC 5322 Email Validation: Proper email format checking
Decimal Precision: Financial-grade number handling
Timeout Protection: Built-in timeout with helpful error messages
Batch Processing: validator.set_batch_size(1000) for high throughput
OpenAI Responses API: Support for new API patterns with intelligent routing
Cross-Provider Compatibility: Works with all supported providers
Built-in Tools: Function calling with automatic tool execution

Advanced Features

from typing import List, Literal
from satya import Model, Field

# Nested models with complex validation
class CompanyProfile(Model):
    name: str = Field(description="Company name")
    employees: List[UserProfile] = Field(description="Employee profiles")
    founded_year: int = Field(description="Founding year", ge=1800, le=2025)

# Tool integration with structured outputs
class WeatherQuery(Model):
    location: str = Field(description="City name")
    unit: Literal["celsius", "fahrenheit"] = Field(description="Temperature unit")

def get_weather(query: WeatherQuery) -> dict:
    # Function automatically receives validated WeatherQuery object
    return {
        "location": query.location,
        "unit": query.unit,
        "temperature": 22,
        "forecast": "sunny"
    }

# Register tool and use with structured inputs
client.register_tool("get_weather", get_weather, "Get weather information", WeatherQuery)

Performance Benefits

Satya v0.3.7: 2-7x faster validation, RFC 5322 email validation, Decimal support, nested models
Production Optimized: Built for high-throughput workloads requiring maximum performance
Memory Efficient: Lower memory usage compared to alternatives
Type Safety: Complete validation coverage with comprehensive error handling

Provider Support for Structured Outputs

Provider	Satya Support	Responses API
OpenAI	✅	✅
Anthropic	⚠️	❌
Gemini	⚠️	❌
Groq	⚠️	❌
Cerebras	⚠️	❌
SambaNova	⚠️	❌
Mistral	⚠️	❌

OpenAI has full support for all structured output patterns. Other providers use prompt engineering with Satya validation.

Learn more in our Structured Outputs Documentation.

Streaming Support

All providers support streaming responses:

async for chunk in await client.completion([
    {"role": "user", "content": "Write a story"}
], stream=True):
    print(chunk, end="", flush=True)

📊 Benchmark Results

Our latest benchmarks show significant performance advantages across different metrics: alt text

⚡ Response Time

LiteLLM: 13.79s
Native: 5.55s
Bhumi: 4.26s
Google GenAI: 6.76s

🚀 Throughput (Requests/Second)

LiteLLM: 3.48
Native: 8.65
Bhumi: 11.27
Google GenAI: 7.10

💾 Peak Memory Usage (MB)

LiteLLM: 275.9MB
Native: 279.6MB
Bhumi: 284.3MB
Google GenAI: 284.8MB

These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.

Configuration Options

The LLMConfig class supports various options:

api_key: API key for the provider
model: Model name in format "provider/model_name"
base_url: Optional custom base URL
max_retries: Number of retries (default: 3)
timeout: Request timeout in seconds (default: 30)
max_tokens: Maximum tokens in response
debug: Enable debug logging

🎯 Why Use Bhumi?

✔ Open Source: Apache 2.0 licensed, free for commercial use
✔ Community Driven: Welcomes contributions from individuals and companies
✔ Blazing Fast: 2-3x faster than alternative solutions
✔ Resource Efficient: Uses 60% less memory than comparable clients
✔ Multi-Model Support: Easily switch between providers
✔ Parallel Requests: Handles multiple concurrent requests effortlessly
✔ Flexibility: Debugging and customization options available
✔ Production Ready: Battle-tested in high-throughput environments

🤝 Contributing

We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:

Submit pull requests
Report issues
Suggest improvements
Share benchmarks
Integrate our optimizations into your libraries (with attribution)

📜 License

Apache 2.0

🌟 Join our community and help make AI inference faster for everyone! 🌟

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4.82

Oct 7, 2025

0.4.81

Sep 24, 2025

0.4.21

Aug 25, 2025

0.4.8

Sep 24, 2025

0.4.6

Sep 23, 2025

0.4.5

Sep 23, 2025

0.4.2

Aug 25, 2025

0.4.1

Aug 24, 2025

0.4.0

Aug 8, 2025

0.3.9

Aug 8, 2025

0.3.2

Aug 8, 2025

0.3.1

Jul 20, 2025

0.2.1

Jun 14, 2025

0.2.0

Jun 14, 2025

0.1.8

Apr 3, 2025

0.1.7

Apr 3, 2025

0.1.6

Mar 1, 2025

0.1.5

Mar 1, 2025

0.1.4

Feb 26, 2025

0.1.3

Feb 22, 2025

0.1.2

Feb 9, 2025

0.1.1

Jan 31, 2025

0.1.0

Jan 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bhumi-0.4.82.tar.gz (110.3 kB view details)

Uploaded Oct 7, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bhumi-0.4.82-cp311-cp311-win_amd64.whl (1.5 MB view details)

Uploaded Oct 7, 2025 CPython 3.11Windows x86-64

bhumi-0.4.82-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Oct 7, 2025 CPython 3.11macOS 11.0+ ARM64

bhumi-0.4.82-cp311-cp311-macosx_10_12_x86_64.whl (1.5 MB view details)

Uploaded Oct 7, 2025 CPython 3.11macOS 10.12+ x86-64

bhumi-0.4.82-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded Oct 7, 2025 CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file bhumi-0.4.82.tar.gz.

File metadata

Download URL: bhumi-0.4.82.tar.gz
Upload date: Oct 7, 2025
Size: 110.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bhumi-0.4.82.tar.gz
Algorithm	Hash digest
SHA256	`194678c8959229b6a6a7a99ee605ee5b461bea1e374391addb9e31f8db0ef4d9`
MD5	`0b20f2b8d955a8154c4e4f2df09694ca`
BLAKE2b-256	`2e2b32421cdc235152f0f8069d7cbb5760231112fddd33ee660012812a202f1e`

See more details on using hashes here.

File details

Details for the file bhumi-0.4.82-cp311-cp311-win_amd64.whl.

File metadata

Download URL: bhumi-0.4.82-cp311-cp311-win_amd64.whl
Upload date: Oct 7, 2025
Size: 1.5 MB
Tags: CPython 3.11, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bhumi-0.4.82-cp311-cp311-win_amd64.whl
Algorithm	Hash digest
SHA256	`04cdb0ba582da20cea846fc0d1e099e3654fb335a5e5c3907e27f3365ec1af3d`
MD5	`27173d422f56249a359ae08fda0a5b19`
BLAKE2b-256	`84353dfbfe5c690ef3a0857189a3f2640451037699679cbb8df95604c10b3590`

See more details on using hashes here.

File details

Details for the file bhumi-0.4.82-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

Download URL: bhumi-0.4.82-cp311-cp311-macosx_11_0_arm64.whl
Upload date: Oct 7, 2025
Size: 1.4 MB
Tags: CPython 3.11, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bhumi-0.4.82-cp311-cp311-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`dec68b10461bfc7728c19c520df2a992f9cf9544935ac5a9e08aea0a5c24a430`
MD5	`895e302a47634a592cbf3f1494ea28d4`
BLAKE2b-256	`1a4b1853647ff74f8ca0cd88f91539c64f2cb2a7f1b45eb1797fbed9dad2b5de`

See more details on using hashes here.

File details

Details for the file bhumi-0.4.82-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

Download URL: bhumi-0.4.82-cp311-cp311-macosx_10_12_x86_64.whl
Upload date: Oct 7, 2025
Size: 1.5 MB
Tags: CPython 3.11, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bhumi-0.4.82-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`21e947033410fe8b1415f6e8553b5603a8e8dbef8e3fa0f3390f9cd6d3ed06dc`
MD5	`7bd2a80b24c694cc5af8c6eeec010749`
BLAKE2b-256	`2f6ba0d8342ef414ed404ee429d4746b2e865d1cd43ef002cb24d134363a782e`

See more details on using hashes here.

File details

Details for the file bhumi-0.4.82-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: bhumi-0.4.82-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Oct 7, 2025
Size: 3.8 MB
Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bhumi-0.4.82-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`51099e236e7eec4a05fd634a729b7ec9d7e80c3eaa5ab1cf3876283b1af7d7fb`
MD5	`3c4663a359190ba389738ad6c78c26d7`
BLAKE2b-256	`24dd0f032b2df4697575a607eb01bebdb80ca4745691e36bf6b8cfc46852f40d`

See more details on using hashes here.

bhumi 0.4.82

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🌍 BHUMI v0.4.82 - The Fastest AI Inference Client ⚡

Introduction

Why Bhumi?

🆕 What's New in v0.4.82

✨ Major New Features

🛠 Technical Improvements

📊 OCR Capabilities

🆕 What's New in v0.4.8

✨ Major New Features

🛠 Technical Improvements

📊 Provider Support Matrix

Installation

Quick Start

OpenAI Example

⚡ Performance Optimizations

🧠 MAP-Elites Buffer Strategy (v0.4.8 Enhanced)

📊 Performance Status Check

🏆 Archive Distribution (v0.4.8 Enhanced)

Gemini Example

Cerebras Example

Mistral AI Example (with Vision)

Provider API: Multi-Provider Model Format

🎯 Provider-Specific Model Access

OpenAI Models

Anthropic Models

Google Gemini Models

Mistral AI Models

🔍 Mistral OCR & Document Analysis

🔍 Dedicated OCR API (Mistral)

Two OCR Workflows

OCR with Structured Output

OCR Features

OCR Response Format

Specialized Models

Cerebras Models (Gateway)

SambaNova Models (Gateway)

OpenRouter Models (Gateway)

🔍 Provider Walkthroughs

OpenAI Walkthrough

Mistral AI Walkthrough

Anthropic Walkthrough

🛠 Advanced Model Selection

Choosing the Right Model

Model Switching at Runtime

Tool Use (Function Calling)

🚀 Structured Outputs with Satya v0.3.7 High-Performance Validation

Satya v0.3.7 Integration

OpenAI Responses API Support

Key Features

Advanced Features

Performance Benefits

Provider Support for Structured Outputs

Streaming Support

📊 Benchmark Results

⚡ Response Time

🚀 Throughput (Requests/Second)

💾 Peak Memory Usage (MB)

Configuration Options

🎯 Why Use Bhumi?

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution