Skip to main content

High performance LLM client

Project description

Bhumi Logo

Bhumi (भूमि)

🌍 BHUMI - AI Client Setup and Usage Guide

Introduction

Bhumi (भूमि) is the Sanskrit word for Earth, symbolizing stability, grounding, and speed. Just as the Earth moves with unwavering momentum, Bhumi AI ensures that your inference speed is as fast as nature itself! 🚀

A fast, async Python client for LLM APIs with Rust under the hood.

Features

  • Async support with Rust-powered concurrency
  • Connection pooling and retry logic
  • Streaming support
  • Support for multiple providers:
    • OpenAI
    • Anthropic
    • Google Gemini
    • Groq
    • SambaNova

Installation

pip install bhumi

Quick Start

OpenAI Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("OPENAI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="openai/gpt-4o",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Gemini Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="gemini/gemini-2.0-flash",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Groq Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("GROQ_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="groq/llama-3.1-8b-it",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

SambaNova Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("SAMBANOVA_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="sambanova/Meta-Llama-3.3-70B-Instruct",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Streaming Support

All providers support streaming responses:

async for chunk in await client.completion([
    {"role": "user", "content": "Write a story"}
], stream=True):
    print(chunk, end="", flush=True)

📊 Benchmark Results

Our latest benchmarks show significant performance advantages across different metrics: alt text

⚡ Response Time

  • LiteLLM: 13.79s
  • Native: 5.55s
  • Bhumi: 4.26s
  • Google GenAI: 6.76s

🚀 Throughput (Requests/Second)

  • LiteLLM: 3.48
  • Native: 8.65
  • Bhumi: 11.27
  • Google GenAI: 7.10

💾 Peak Memory Usage (MB)

  • LiteLLM: 275.9MB
  • Native: 279.6MB
  • Bhumi: 284.3MB
  • Google GenAI: 284.8MB

These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.

Configuration Options

The LLMConfig class supports various options:

  • api_key: API key for the provider
  • model: Model name in format "provider/model_name"
  • base_url: Optional custom base URL
  • max_retries: Number of retries (default: 3)
  • timeout: Request timeout in seconds (default: 30)
  • max_tokens: Maximum tokens in response
  • debug: Enable debug logging

🎯 Why Use Bhumi?

Open Source: Apache 2.0 licensed, free for commercial use
Community Driven: Welcomes contributions from individuals and companies
Blazing Fast: 2-3x faster than alternative solutions
Resource Efficient: Uses 60% less memory than comparable clients
Multi-Model Support: Easily switch between providers
Parallel Requests: Handles multiple concurrent requests effortlessly
Flexibility: Debugging and customization options available
Production Ready: Battle-tested in high-throughput environments

🤝 Contributing

We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:

  • Submit pull requests
  • Report issues
  • Suggest improvements
  • Share benchmarks
  • Integrate our optimizations into your libraries (with attribution)

📜 License

Apache 2.0

🌟 Join our community and help make AI inference faster for everyone! 🌟

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bhumi-0.1.6.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bhumi-0.1.6-cp38-abi3-macosx_11_0_arm64.whl (1.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file bhumi-0.1.6.tar.gz.

File metadata

  • Download URL: bhumi-0.1.6.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for bhumi-0.1.6.tar.gz
Algorithm Hash digest
SHA256 01021335f1180624f95122e68d5de6855896f74ea48a85e1c3425014948603c5
MD5 d08fe9209acd985c0a86f0496544d855
BLAKE2b-256 19cfa320471570fea34e7c938a528a58db15efbd0e31e8ab88b891a57104d72d

See more details on using hashes here.

File details

Details for the file bhumi-0.1.6-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bhumi-0.1.6-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c983cec487c96233570e09d3491c7c48527dbe5118ac810ee562eb3280b19659
MD5 02ceec5c4ac44ebd81467c0be8fe8f61
BLAKE2b-256 750b8e10a7a8ab20eb8ea16bbe31ee34a4a5af63df48882765d22e1c5cc582b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page