High performance LLM client
Project description
Bhumi
🌍 BHUMI - The Fastest AI Inference Client ⚡
Introduction
Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.
Why Bhumi?
- 🚀 Fastest AI inference client – Outperforms alternatives with 2-3x higher throughput
- ⚡ Built with Rust for Python – Achieves high efficiency with low overhead
- 🌐 Supports multiple AI providers – OpenAI, Anthropic, Google Gemini, Groq, SambaNova, and more
- 🔄 Streaming and async capabilities – Real-time responses with Rust-powered concurrency
- 🔁 Automatic connection pooling and retries – Ensures reliability and efficiency
- 💡 Minimal memory footprint – Uses up to 60% less memory than other clients
- 🏗 Production-ready – Optimized for high-throughput applications
Bhumi (भूमि) is Sanskrit for Earth, symbolizing stability, grounding, and speed—just like our inference engine, which ensures rapid and stable performance. 🚀
Installation
pip install bhumi
Quick Start
OpenAI Example
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("OPENAI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="openai/gpt-4o",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
Gemini Example
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("GEMINI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="gemini/gemini-2.0-flash",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
Streaming Support
All providers support streaming responses:
async for chunk in await client.completion([
{"role": "user", "content": "Write a story"}
], stream=True):
print(chunk, end="", flush=True)
📊 Benchmark Results
Our latest benchmarks show significant performance advantages across different metrics:
⚡ Response Time
- LiteLLM: 13.79s
- Native: 5.55s
- Bhumi: 4.26s
- Google GenAI: 6.76s
🚀 Throughput (Requests/Second)
- LiteLLM: 3.48
- Native: 8.65
- Bhumi: 11.27
- Google GenAI: 7.10
💾 Peak Memory Usage (MB)
- LiteLLM: 275.9MB
- Native: 279.6MB
- Bhumi: 284.3MB
- Google GenAI: 284.8MB
These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.
Configuration Options
The LLMConfig class supports various options:
api_key: API key for the providermodel: Model name in format "provider/model_name"base_url: Optional custom base URLmax_retries: Number of retries (default: 3)timeout: Request timeout in seconds (default: 30)max_tokens: Maximum tokens in responsedebug: Enable debug logging
🎯 Why Use Bhumi?
✔ Open Source: Apache 2.0 licensed, free for commercial use
✔ Community Driven: Welcomes contributions from individuals and companies
✔ Blazing Fast: 2-3x faster than alternative solutions
✔ Resource Efficient: Uses 60% less memory than comparable clients
✔ Multi-Model Support: Easily switch between providers
✔ Parallel Requests: Handles multiple concurrent requests effortlessly
✔ Flexibility: Debugging and customization options available
✔ Production Ready: Battle-tested in high-throughput environments
🤝 Contributing
We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:
- Submit pull requests
- Report issues
- Suggest improvements
- Share benchmarks
- Integrate our optimizations into your libraries (with attribution)
📜 License
Apache 2.0
🌟 Join our community and help make AI inference faster for everyone! 🌟
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bhumi-0.1.7-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: bhumi-0.1.7-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fffef5c7f9bae41f8731dc661b216e443f66ec933eca1a22f5906246eda69b4d
|
|
| MD5 |
0472e9e2473a43f0d819f7fc472f26f0
|
|
| BLAKE2b-256 |
607956c64c390392d6c2926f9affe56055dfadbf63011f0aada6a917eeac37e0
|