High performance LLM client
Project description
🌍 BHUMI - The Fastest AI Inference Client ⚡
Introduction
Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.
Why Bhumi?
- 🚀 Fastest AI inference client – Outperforms alternatives with 2-3x higher throughput
- ⚡ Built with Rust for Python – Achieves high efficiency with low overhead
- 🌐 Supports multiple AI providers – OpenAI, Anthropic, Google Gemini, Groq, SambaNova, and more
- 🔄 Streaming and async capabilities – Real-time responses with Rust-powered concurrency
- 🔁 Automatic connection pooling and retries – Ensures reliability and efficiency
- 💡 Minimal memory footprint – Uses up to 60% less memory than other clients
- 🏗 Production-ready – Optimized for high-throughput applications
Bhumi (भूमि) is Sanskrit for Earth, symbolizing stability, grounding, and speed—just like our inference engine, which ensures rapid and stable performance. 🚀
Installation
pip install bhumi
Quick Start
OpenAI Example
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("OPENAI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="openai/gpt-4o",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
⚡ Performance Optimizations
Bhumi includes cutting-edge performance optimizations that make it 2-3x faster than alternatives:
🧠 MAP-Elites Buffer Strategy
- Ultra-fast archive loading with Satya validation + orjson parsing (3x faster than standard JSON)
- Trained buffer configurations optimized through evolutionary algorithms
- Automatic buffer adjustment based on response patterns and historical data
- Type-safe validation with comprehensive error checking
- Secure loading without unsafe
eval()operations
📊 Performance Status Check
Check if you have optimal performance with the built-in diagnostics:
from bhumi.utils import print_performance_status
# Check optimization status
print_performance_status()
# 🚀 Bhumi Performance Status
# ✅ Optimized MAP-Elites archive loaded
# ⚡ Optimization Details:
# • Entries: 15,644 total, 15,644 optimized
# • Coverage: 100.0% of search space
# • Loading: Satya validation + orjson parsing (3x faster)
🏆 Archive Distribution
When you install Bhumi, you automatically get:
- Pre-trained MAP-Elites archive for optimal buffer sizing
- Fast orjson-based JSON parsing (2-3x faster than standard
json) - Satya-powered type validation for bulletproof data loading
- Performance metrics and diagnostics
Gemini Example
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("GEMINI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="gemini/gemini-2.0-flash",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
Provider API: Multi-Provider Model Format
Bhumi unifies providers using a simple provider/model format in LLMConfig.model. Base URLs are auto-set for known providers; you can override with base_url.
- Supported providers:
openai,anthropic,gemini,groq,sambanova,openrouter - Foundation providers use
provider/model. Gateways like Groq/OpenRouter/SambaNova may use nested paths after the provider (e.g.,openrouter/meta-llama/llama-3.1-8b-instruct).
from bhumi.base_client import BaseLLMClient, LLMConfig
# OpenAI
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o"))
# Anthropic
client = BaseLLMClient(LLMConfig(api_key=os.getenv("ANTHROPIC_API_KEY"), model="anthropic/claude-3-5-sonnet-latest"))
# Gemini (OpenAI-compatible endpoint)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GEMINI_API_KEY"), model="gemini/gemini-2.0-flash"))
# Groq (gateway) – nested path after provider is kept intact
client = BaseLLMClient(LLMConfig(api_key=os.getenv("GROQ_API_KEY"), model="groq/llama-3.1-8b-instant"))
# SambaNova (gateway)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("SAMBANOVA_API_KEY"), model="sambanova/Meta-Llama-3.1-405B-Instruct"))
# OpenRouter (gateway)
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENROUTER_API_KEY"), model="openrouter/meta-llama/llama-3.1-8b-instruct"))
# Optional: override base URL
client = BaseLLMClient(LLMConfig(api_key="...", model="openai/gpt-4o", base_url="https://api.openai.com/v1"))
Tool Use (Function Calling)
Bhumi supports OpenAI-style function calling and Gemini function declarations. Register Python callables with JSON schemas; Bhumi will add them to requests and execute tool calls automatically.
import os, asyncio, json
from bhumi.base_client import BaseLLMClient, LLMConfig
# 1) Define a tool
def get_weather(location: str, unit: str = "celsius"):
return {"location": location, "unit": unit, "forecast": "sunny", "temp": 27}
tool_schema = {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and country"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
async def main():
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o", debug=True))
client.register_tool("get_weather", get_weather, "Get the current weather", tool_schema)
# 2) Ask a question that should trigger a tool call
resp = await client.completion([
{"role": "user", "content": "What's the weather in Tokyo in celsius?"}
])
print(resp["text"]) # Tool is executed and response incorporates tool output
asyncio.run(main())
Notes:
- OpenAI-compatible providers use
toolswithtool_callsin responses; Gemini usesfunction_declarationsandtool_configunder the hood. - Bhumi parses tool calls, executes your Python function, appends a
toolmessage, and continues the conversation automatically.
Structured Output via Pydantic
Generate schema-conformant JSON using a Pydantic model. Bhumi registers a hidden tool generate_structured_output for the model; the LLM will call it to return strictly-typed data.
from pydantic import BaseModel
from bhumi.base_client import BaseLLMClient, LLMConfig
class UserInfo(BaseModel):
"""Return the user's full_name and age"""
full_name: str
age: int
async def main():
client = BaseLLMClient(LLMConfig(api_key=os.getenv("OPENAI_API_KEY"), model="openai/gpt-4o", debug=True))
client.set_structured_output(UserInfo)
resp = await client.completion([
{"role": "user", "content": "Extract name and age from: Alice Johnson, age 29"}
])
# The model uses the registered tool so the final message contains strict JSON
print(resp["text"]) # e.g., {"full_name": "Alice Johnson", "age": 29}
asyncio.run(main())
Dependencies: structured output uses Pydantic v2. Ensure pydantic>=2 is installed (bundled as a dependency).
Streaming Support
All providers support streaming responses:
async for chunk in await client.completion([
{"role": "user", "content": "Write a story"}
], stream=True):
print(chunk, end="", flush=True)
📊 Benchmark Results
Our latest benchmarks show significant performance advantages across different metrics:
⚡ Response Time
- LiteLLM: 13.79s
- Native: 5.55s
- Bhumi: 4.26s
- Google GenAI: 6.76s
🚀 Throughput (Requests/Second)
- LiteLLM: 3.48
- Native: 8.65
- Bhumi: 11.27
- Google GenAI: 7.10
💾 Peak Memory Usage (MB)
- LiteLLM: 275.9MB
- Native: 279.6MB
- Bhumi: 284.3MB
- Google GenAI: 284.8MB
These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.
Configuration Options
The LLMConfig class supports various options:
api_key: API key for the providermodel: Model name in format "provider/model_name"base_url: Optional custom base URLmax_retries: Number of retries (default: 3)timeout: Request timeout in seconds (default: 30)max_tokens: Maximum tokens in responsedebug: Enable debug logging
🎯 Why Use Bhumi?
✔ Open Source: Apache 2.0 licensed, free for commercial use
✔ Community Driven: Welcomes contributions from individuals and companies
✔ Blazing Fast: 2-3x faster than alternative solutions
✔ Resource Efficient: Uses 60% less memory than comparable clients
✔ Multi-Model Support: Easily switch between providers
✔ Parallel Requests: Handles multiple concurrent requests effortlessly
✔ Flexibility: Debugging and customization options available
✔ Production Ready: Battle-tested in high-throughput environments
🤝 Contributing
We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:
- Submit pull requests
- Report issues
- Suggest improvements
- Share benchmarks
- Integrate our optimizations into your libraries (with attribution)
📜 License
Apache 2.0
🌟 Join our community and help make AI inference faster for everyone! 🌟
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bhumi-0.3.9.tar.gz.
File metadata
- Download URL: bhumi-0.3.9.tar.gz
- Upload date:
- Size: 74.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01293e3b0f8e3b0b87e01153ae180a2dd72c20e548015ceef3827bc754d20c42
|
|
| MD5 |
78be063422ddf0f882f68799c1453daf
|
|
| BLAKE2b-256 |
a91a79353110d4f894257528f8e8b81fabb5db4a7dcbfe54829e58fd6457cfb0
|
File details
Details for the file bhumi-0.3.9-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: bhumi-0.3.9-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0459307bd6fdeb88757083364f14cd220de54d8b1131dc936421b09c8d2df047
|
|
| MD5 |
ae1dce5c84ade32b2c7606ccdb1c97fe
|
|
| BLAKE2b-256 |
123a8fb19b2706617859afda6c09a40559885ad1375755236e6cd1cdc7b9f2d3
|