Skip to main content

AISBF - AI Service Broker Framework || AI Should Be Free - A modular proxy server for managing multiple AI provider integrations

Project description

AISBF - AI Service Broker Framework || AI Should Be Free

A modular proxy server for managing multiple AI provider integrations with unified API interface. AISBF provides intelligent routing, load balancing, and AI-assisted model selection to optimize AI service usage across multiple providers.

Key Features

  • Multi-Provider Support: Unified interface for Google, OpenAI, Anthropic, and Ollama
  • Rotation Models: Weighted load balancing across multiple providers with automatic failover
  • Autoselect Models: AI-powered model selection based on content analysis and request characteristics
  • Streaming Support: Full support for streaming responses from all providers
  • Error Tracking: Automatic provider disabling after consecutive failures with cooldown periods
  • Rate Limiting: Built-in rate limiting and graceful error handling
  • Request Splitting: Automatic splitting of large requests when exceeding max_request_tokens limit
  • Token Rate Limiting: Per-model token usage tracking with TPM (tokens per minute), TPH (tokens per hour), and TPD (tokens per day) limits
  • Automatic Provider Disabling: Providers automatically disabled when token rate limits are exceeded
  • Context Management: Automatic context condensation when approaching model limits with multiple condensation methods
  • Effective Context Tracking: Reports total tokens used (effective_context) for every request

Author

Stefy Lanza stefy@nexlab.net

Repository

Official repository: https://git.nexlab.net/nexlab/aisbf.git

Quick Start

Installation

From PyPI (Recommended)

pip install aisbf

From Source

python setup.py install

Usage

aisbf

Server starts on http://127.0.0.1:17765

Development

Building the Package

To build the package for PyPI distribution:

./build.sh

This creates distribution files in the dist/ directory.

Cleaning Build Artifacts

To remove all build artifacts and temporary files:

./clean.sh

PyPI Publishing

See PYPI.md for detailed instructions on publishing to PyPI.

Supported Providers

  • Google (google-genai)
  • OpenAI and openai-compatible endpoints (openai)
  • Anthropic (anthropic)
  • Ollama (direct HTTP)

Configuration

Model Configuration

Models can be configured with the following optional fields:

  • max_request_tokens: Maximum tokens allowed per request. Requests exceeding this limit are automatically split into multiple smaller requests.
  • rate_limit_TPM: Maximum tokens allowed per minute (Tokens Per Minute)
  • rate_limit_TPH: Maximum tokens allowed per hour (Tokens Per Hour)
  • rate_limit_TPD: Maximum tokens allowed per day (Tokens Per Day)
  • context_size: Maximum context size in tokens for the model. Used to determine when to trigger context condensation.
  • condense_context: Percentage (0-100) at which to trigger context condensation. 0 means disabled, any other value triggers condensation when context reaches this percentage of context_size.
  • condense_method: String or list of strings specifying condensation method(s). Supported values: "hierarchical", "conversational", "semantic", "algoritmic". Multiple methods can be chained together.

When token rate limits are exceeded, providers are automatically disabled:

  • TPM limit exceeded: Provider disabled for 1 minute
  • TPH limit exceeded: Provider disabled for 1 hour
  • TPD limit exceeded: Provider disabled for 1 day

Context Condensation Methods

When context exceeds the configured percentage of context_size, the system automatically condenses the prompt using one or more methods:

  1. Hierarchical: Separates context into persistent (long-term facts) and transient (immediate task) layers
  2. Conversational: Summarizes old messages using a smaller model to maintain conversation continuity
  3. Semantic: Prunes irrelevant context based on current query using a smaller "janitor" model
  4. Algoritmic: Uses mathematical compression for technical data and logs (similar to LLMLingua)

See config/providers.json and config/rotations.json for configuration examples.

API Endpoints

General Endpoints

  • GET / - Server status and provider list (includes providers, rotations, and autoselect)

Provider Endpoints

  • POST /api/{provider_id}/chat/completions - Chat completions for a specific provider
  • GET /api/{provider_id}/models - List available models for a specific provider

Rotation Endpoints

  • GET /api/rotations - List all available rotation configurations
  • POST /api/rotations/chat/completions - Chat completions using rotation (load balancing across providers)
    • Rotation Models: Weighted random selection of models across multiple providers
    • Automatic failover between providers on errors
    • Configurable weights for each model to prioritize preferred options
    • Supports both streaming and non-streaming responses
  • GET /api/rotations/models - List all models across all rotation configurations

Autoselect Endpoints

  • GET /api/autoselect - List all available autoselect configurations
  • POST /api/autoselect/chat/completions - Chat completions using AI-assisted selection based on content analysis
    • Autoselect Models: AI analyzes request content to select the most appropriate model
    • Automatic routing to specialized models based on task type (coding, analysis, creative writing, etc.)
    • Fallback to default model if selection fails
    • Supports both streaming and non-streaming responses
  • GET /api/autoselect/models - List all models across all autoselect configurations

Error Handling

  • Rate limiting for failed requests
  • Automatic retry with provider rotation
  • Proper error tracking and logging
  • Fixed streaming response serialization for OpenAI-compatible providers
  • Improved autoselect model selection with explicit output requirements

Donations

The project includes multiple donation options to support its development:

Ethereum Donation

ETH to 0xdA6dAb526515b5cb556d20269207D43fcc760E51

PayPal Donation

https://paypal.me/nexlab

Bitcoin Donation

Address: bc1qcpt2uutqkz4456j5r78rjm3gwq03h5fpwmcc5u Traditional BTC donation method

Documentation

See DOCUMENTATION.md for complete API documentation, configuration details, and development guides.

License

GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aisbf-0.3.2.tar.gz (66.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aisbf-0.3.2-py3-none-any.whl (113.5 kB view details)

Uploaded Python 3

File details

Details for the file aisbf-0.3.2.tar.gz.

File metadata

  • Download URL: aisbf-0.3.2.tar.gz
  • Upload date:
  • Size: 66.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for aisbf-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9f638748be16625ae2e3d8146aba3820ceb22dd9581b98f329e043c712ba6685
MD5 e838f6536f1b4ce4f28ae8cf54334674
BLAKE2b-256 5411fe813807542f793ac6df0a0ee087bd35b6e710757a0799a1bc2cb588dd28

See more details on using hashes here.

File details

Details for the file aisbf-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: aisbf-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 113.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for aisbf-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3cca6d32e242d5716c80000f31b0ad4f768b6664ea79aeec18731a35a4a7f7c6
MD5 100bb174dbd34dc3b98cf324178d6e40
BLAKE2b-256 b4aea9ac0b6c7181d34d332b8b5b104dc702504fa44fa56f032d5ea25bd709be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page