AISBF - AI Service Broker Framework || AI Should Be Free - A modular proxy server for managing multiple AI provider integrations

These details have not been verified by PyPI

Project links

Project description

AISBF - AI Service Broker Framework || AI Should Be Free

A modular proxy server for managing multiple AI provider integrations with unified API interface. AISBF provides intelligent routing, load balancing, and AI-assisted model selection to optimize AI service usage across multiple providers.

AISBF Dashboard

Web Dashboard

AISBF includes a comprehensive web-based dashboard for easy configuration and management:

Provider Management: Configure API keys, endpoints, and model settings with automatic metadata extraction
Rotation Configuration: Set up weighted load balancing across providers
Autoselect Configuration: Configure AI-powered model selection
Server Settings: Manage SSL/TLS, authentication, and TOR hidden service
User Management: Create/manage users with role-based access control (admin users only)
Multi-User Support: Isolated configurations per user with API token management
Real-time Monitoring: View provider status and configuration
Token Usage Analytics: Track token usage, costs, and performance with charts, filtering by provider/model/rotation, and export functionality
Rate Limits Dashboard: Monitor adaptive rate limiting with real-time statistics, 429 counts, success rates, and recovery progress
Model Metadata Display: View detailed model information including pricing, rate limits, and supported parameters
Cache Management: View cache statistics and clear cache via dashboard endpoints

Access the dashboard at http://localhost:17765/dashboard (default credentials: admin/admin)

Key Features

Multi-Provider Support: Unified interface for Google, OpenAI, Anthropic, Ollama, Kiro (Amazon Q Developer), Kiro-cli, Claude Code (OAuth2), Kilocode (OAuth2), Codex (OAuth2), and Qwen (OAuth2)
Claude OAuth2 Authentication: Full OAuth2 PKCE flow for Claude Code with automatic token refresh and Chrome extension for remote servers
Kilocode OAuth2 Authentication: OAuth2 Device Authorization Grant for Kilo Code with automatic token refresh
Codex OAuth2 Authentication: OAuth2 Device Authorization Grant for OpenAI Codex with automatic token refresh and API key exchange
Rotation Models: Weighted load balancing across multiple providers with automatic failover
Autoselect Models: AI-powered model selection based on content analysis and request characteristics
Semantic Classification: Fast hybrid BM25 + semantic model selection using sentence transformers (optional)
Content Classification: NSFW/privacy content filtering with configurable classification windows
Streaming Support: Full support for streaming responses from all providers
Error Tracking: Automatic provider disabling after consecutive failures with configurable cooldown periods
Adaptive Rate Limiting: Intelligent rate limit management that learns from 429 responses with exponential backoff, gradual recovery, and dashboard monitoring
Rate Limiting: Built-in rate limiting and graceful error handling
Request Splitting: Automatic splitting of large requests when exceeding max_request_tokens limit
Token Rate Limiting: Per-model token usage tracking with TPM (tokens per minute), TPH (tokens per hour), and TPD (tokens per day) limits
Automatic Provider Disabling: Providers automatically disabled when token rate limits are exceeded
Context Management: Automatic context condensation when approaching model limits with multiple condensation methods
Provider-Level Defaults: Set default condensation settings at provider level with cascading fallback logic
Effective Context Tracking: Reports total tokens used (effective_context) for every request
Enhanced Context Condensation: 8 condensation methods including hierarchical, conversational, semantic, algorithmic, sliding window, importance-based, entity-aware, and code-aware condensation
Provider-Native Caching: 50-70% cost reduction using Anthropic cache_control, Google Context Caching, and OpenAI-compatible APIs (including prompt_cache_key for OpenAI load balancer routing)
Response Caching (Semantic Deduplication): 20-30% cache hit rate with intelligent request deduplication
- Multiple backends: In-memory LRU cache, Redis, SQLite, MySQL, file-based
- SHA256-based cache key generation for request deduplication
- TTL-based expiration with configurable timeouts
- Granular cache control at model, provider, rotation, and autoselect levels
- Cache statistics tracking (hits, misses, hit rate, evictions)
- Dashboard endpoints for cache management
Smart Request Batching: 15-25% latency reduction by batching similar requests within 100ms window with provider-specific configurations
Streaming Response Optimization: 10-20% memory reduction with chunk pooling, backpressure handling, and provider-specific streaming optimizations for Google and Kiro providers
Token Usage Analytics: Comprehensive analytics dashboard with charts, cost estimation, performance tracking, filtering by provider/model/rotation, and export functionality
Model Metadata Extraction: Automatic extraction of pricing, rate limits, and model information from provider responses with dashboard display
SSL/TLS Support: Built-in HTTPS support with Let's Encrypt integration and automatic certificate renewal
Self-Signed Certificates: Automatic generation of self-signed certificates for development/testing
TOR Hidden Service: Full support for exposing AISBF over TOR network as a hidden service (ephemeral and persistent)
MCP Server: Model Context Protocol server for remote agent configuration and model access (SSE and HTTP streaming)
Persistent Database: SQLite/MySQL-based tracking of token usage, context dimensions, and model embeddings with automatic cleanup
Multi-User Support: User management with isolated configurations, role-based access control, and API token management
User-Specific API Endpoints: Dedicated API endpoints for authenticated users to access their own configurations with Bearer token authentication
Database Integration: SQLite/MySQL-based persistent storage for user configurations, token usage tracking, and context management
User-Specific Configurations: Each user can have their own providers, rotations, and autoselect configurations stored in the database
Flexible Caching System: Multi-backend caching for model embeddings and performance optimization
- Redis: High-performance distributed caching for production
- SQLite/MySQL: Persistent database-backed caching
- File-based: Legacy local file storage
- Memory: In-memory caching for development
- Automatic fallback between backends
- Configurable TTL per data type
Proxy-Awareness: Full support for reverse proxy deployments with automatic URL generation and subpath support

Author

Stefy Lanza stefy@nexlab.net

Repository

Official repository: https://git.nexlab.net/nexlab/aisbf.git

Quick Start

Installation

From PyPI (Recommended)

pip install aisbf

From Source

python setup.py install

Usage

aisbf

Server starts on http://127.0.0.1:17765

Development

Building the Package

To build the package for PyPI distribution:

./build.sh

This creates distribution files in the dist/ directory.

Cleaning Build Artifacts

To remove all build artifacts and temporary files:

./clean.sh

PyPI Publishing

See PYPI.md for detailed instructions on publishing to PyPI.

Supported Providers

Google (google-genai)
OpenAI and openai-compatible endpoints (openai)
Anthropic (anthropic)
Claude Code (OAuth2 authentication via claude.ai)
Ollama (direct HTTP)
Kiro (Amazon Q Developer / AWS CodeWhisperer)
Kiro-cli (Amazon Q Developer CLI authentication)
Kilocode (OAuth2 Device Authorization Grant)
Codex (OAuth2 Device Authorization Grant - OpenAI protocol)
Qwen (OAuth2 Device Authorization Grant with PKCE - DashScope OpenAI-compatible)

Kiro-cli Provider Support

AISBF supports Kiro (Amazon Q Developer) via kiro-cli authentication using Device Authorization Grant flow:

Features

Full OAuth2 Device Authorization Grant flow matching official kiro-cli
Automatic token refresh
SQLite database-based credential storage
Dashboard integration with authentication UI
Credentials stored in user-configured SQLite database path

Setup

Add kiro-cli provider to configuration (via dashboard or ~/.aisbf/providers.json)
Configure sqlite_db path pointing to kiro-cli credentials database
Use kiro-cli models via API: kiro-cli/anthropic.claude-3-5-sonnet-20241022-v2:0

Configuration Example

{
  "providers": {
    "kiro-cli": {
      "id": "kiro-cli",
      "name": "Kiro-cli (Amazon Q Developer)",
      "endpoint": "https://q.us-east-1.amazonaws.com",
      "type": "kiro",
      "api_key_required": false,
      "kiro_config": {
        "region": "us-east-1",
        "sqlite_db": "~/.config/kiro/kiro.db"
      },
      "models": [
        {
          "name": "anthropic.claude-3-5-sonnet-20241022-v2:0",
          "context_size": 200000
        }
      ]
    }
  }
}

Kilocode OAuth2 Authentication

AISBF supports Kilo Code as a provider using OAuth2 Device Authorization Grant:

Features

Full OAuth2 Device Authorization Grant flow
Automatic token refresh with refresh token rotation
Dashboard integration with authentication UI
Proxy-aware callback handling
Credentials stored in ~/.aisbf/kilo_credentials.json

Setup

Add kilocode provider to configuration (via dashboard or ~/.aisbf/providers.json)
Click "Authenticate with Kilo" in dashboard
Complete device authorization flow
Use kilocode models via API: kilocode/<model>

Configuration Example

{
  "providers": {
    "kilocode": {
      "id": "kilocode",
      "name": "Kilo Code (OAuth2)",
      "endpoint": "https://api.kilo.dev/v1",
      "type": "kilo",
      "api_key_required": false,
      "kilo_config": {
        "credentials_file": "~/.aisbf/kilo_credentials.json"
      },
      "models": [
        {
          "name": "kilo/free",
          "context_size": 100000
        }
      ]
    }
  }
}

See KILO_OAUTH2.md for detailed setup instructions.

Codex OAuth2 Authentication

AISBF supports OpenAI Codex as a provider using OAuth2 Device Authorization Grant:

Features

Full OAuth2 Device Authorization Grant flow (same protocol as OpenAI)
Automatic token refresh with refresh token rotation
API key exchange from ID token for direct API access
Dashboard integration with authentication UI
No localhost callback port needed (device code flow)
Credentials stored in ~/.aisbf/codex_credentials.json

Setup

Add codex provider to configuration (via dashboard or ~/.aisbf/providers.json)
Click "Authenticate with Codex (Device Code)" in dashboard
Complete device authorization flow at https://auth.openai.com/codex/device
Use codex models via API: codex/<model>

Configuration Example

{
  "providers": {
    "codex": {
      "id": "codex",
      "name": "Codex (OpenAI OAuth2)",
      "endpoint": "https://api.openai.com/v1",
      "type": "codex",
      "api_key_required": false,
      "codex_config": {
        "credentials_file": "~/.aisbf/codex_credentials.json",
        "issuer": "https://auth.openai.com"
      },
      "models": [
        {
          "name": "gpt-4o",
          "context_size": 128000
        }
      ]
    }
  }
}

Qwen OAuth2 Authentication

AISBF supports Qwen (Alibaba Cloud) as a provider using OAuth2 Device Authorization Grant with PKCE:

Features

Full OAuth2 Device Authorization Grant flow with PKCE (S256)
Automatic token refresh with cross-process synchronization
OpenAI-compatible DashScope API endpoint
Dashboard integration with authentication UI
No localhost callback port needed (device code flow)
Credentials stored in ~/.aisbf/qwen_credentials.json
Optional API key mode (bypass OAuth2)
File-based locking for multi-process token management

Setup

Add qwen provider to configuration (via dashboard or ~/.aisbf/providers.json)
Click "Authenticate with Qwen (Device Code)" in dashboard
Complete device authorization flow at https://chat.qwen.ai
Use qwen models via API: qwen/<model>

Configuration Example

{
  "providers": {
    "qwen": {
      "id": "qwen",
      "name": "Qwen (OAuth2)",
      "endpoint": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "type": "qwen",
      "api_key_required": false,
      "qwen_config": {
        "credentials_file": "~/.aisbf/qwen_credentials.json",
        "api_key": ""
      },
      "models": [
        {
          "name": "qwen-plus",
          "context_size": 32000
        },
        {
          "name": "coder-model",
          "context_size": 32000
        }
      ]
    }
  }
}

API Key Mode (Optional)

If you prefer to use an API key instead of OAuth2:

Obtain an API key from Alibaba Cloud DashScope
Set api_key in qwen_config
The provider will use the API key directly instead of OAuth2

Available Models

qwen-plus - Enhanced model with 32K context
qwen-turbo - Fast model for quick responses
qwen-max - Top-tier model with advanced capabilities
coder-model - Specialized coding model (maps to qwen3.6-plus)

Configuration

SSL/TLS Configuration

AISBF supports HTTPS with automatic certificate management:

Self-Signed Certificates (Default)

Automatically generated on first run when HTTPS is enabled
Stored in ~/.aisbf/cert.pem and ~/.aisbf/key.pem
Valid for 365 days
Suitable for development and internal use

Let's Encrypt Integration

Configure a public domain in the dashboard settings to enable Let's Encrypt:

Automatic certificate generation using certbot
Automatic renewal when certificates expire within 30 days
Valid certificates trusted by all browsers
Requires certbot to be installed on the system

Installation:

# Ubuntu/Debian
sudo apt-get install certbot

# CentOS/RHEL
sudo yum install certbot

# macOS
brew install certbot

Configuration:

Navigate to Dashboard → Settings
Set Protocol to "HTTPS"
Enter your public domain (e.g., api.example.com)
Optionally specify custom certificate/key paths
Save settings

The system will automatically:

Generate certificates using Let's Encrypt if a public domain is configured
Fall back to self-signed certificates if Let's Encrypt fails
Check certificate expiry on startup
Renew certificates when they expire within 30 days

TOR Hidden Service Configuration

AISBF can be exposed over the TOR network as a hidden service, providing anonymous access to your AI proxy server.

Prerequisites

TOR must be installed and running on your system
TOR control port must be enabled (default: 9051)

Installation:

# Ubuntu/Debian
sudo apt-get install tor

# CentOS/RHEL
sudo yum install tor

# macOS
brew install tor

Enable TOR Control Port: Edit /etc/tor/torrc (or ~/.torrc on macOS) and add:

ControlPort 9051
CookieAuthentication 1

Then restart TOR:

sudo systemctl restart tor  # Linux
brew services restart tor   # macOS

Configuration

Via Dashboard:

Navigate to Dashboard → Settings
Scroll to "TOR Hidden Service" section
Enable "Enable TOR Hidden Service"
Configure settings:
- Control Host: TOR control port host (default: 127.0.0.1)
- Control Port: TOR control port (default: 9051)
- Control Password: Optional password for TOR control authentication
- Hidden Service Directory: Leave blank for ephemeral service, or specify path for persistent service
- Hidden Service Port: Port exposed on the hidden service (default: 80)
- SOCKS Proxy Host: TOR SOCKS proxy host (default: 127.0.0.1)
- SOCKS Proxy Port: TOR SOCKS proxy port (default: 9050)
Save settings and restart server

Via Configuration File: Edit ~/.aisbf/aisbf.json:

{
  "tor": {
    "enabled": true,
    "control_port": 9051,
    "control_host": "127.0.0.1",
    "control_password": null,
    "hidden_service_dir": null,
    "hidden_service_port": 80,
    "socks_port": 9050,
    "socks_host": "127.0.0.1"
  }
}

Ephemeral vs Persistent Hidden Services

Ephemeral (Default):

Temporary hidden service created on startup
New onion address generated each time
No files stored on disk
Ideal for temporary or testing purposes
Set hidden_service_dir to null or leave blank

Persistent:

Permanent hidden service with fixed onion address
Address persists across restarts
Keys stored in specified directory
Ideal for production use
Set hidden_service_dir to a path (e.g., ~/.aisbf/tor_hidden_service)

Accessing Your Hidden Service

Once enabled, the onion address will be displayed:

In the server logs on startup
In the Dashboard → Settings → TOR Hidden Service status section
Via MCP get_tor_status tool (requires fullconfig access)

Access your service via TOR Browser or any TOR-enabled client:

http://your-onion-address.onion/

Security Considerations

TOR hidden services provide anonymity but not authentication
Enable API authentication in AISBF settings for additional security
Use strong dashboard passwords
Consider using persistent hidden services for production
Monitor access logs for suspicious activity
Keep TOR and AISBF updated

Claude OAuth2 Authentication

AISBF supports Claude Code (claude.ai) as a provider using OAuth2 authentication with automatic token refresh:

Features

Full OAuth2 PKCE flow matching official claude-cli
Automatic token refresh with refresh token rotation
Chrome extension for remote server OAuth2 callback interception
Dashboard integration with authentication UI
Proxy-aware extension serving: automatically detects reverse proxy deployments
Force interception mechanism: extension activates for localhost when OAuth flow initiated from dashboard
Supports nginx, caddy, and other reverse proxies with X-Forwarded-* header detection
Credentials stored in ~/.aisbf/claude_credentials.json
Optional curl_cffi TLS fingerprinting for Cloudflare bypass
Compatible with official claude-cli credentials

Setup

Add Claude provider to configuration (via dashboard or ~/.aisbf/providers.json)
For remote servers: Install Chrome extension (download from dashboard)
Click "Authenticate with Claude" in dashboard
Log in with your Claude account
Use Claude models via API: claude/claude-3-7-sonnet-20250219

Configuration Example

{
  "providers": {
    "claude": {
      "id": "claude",
      "name": "Claude Code (OAuth2)",
      "endpoint": "https://api.anthropic.com/v1",
      "type": "claude",
      "api_key_required": false,
      "claude_config": {
        "credentials_file": "~/.aisbf/claude_credentials.json"
      },
      "models": [
        {
          "name": "claude-3-7-sonnet-20250219",
          "context_size": 200000
        }
      ]
    }
  }
}

See CLAUDE_OAUTH2_SETUP.md for detailed setup instructions and CLAUDE_OAUTH2_DEEP_DIVE.md for technical details.

Response Caching (Semantic Deduplication)

AISBF includes an intelligent response caching system that deduplicates similar requests to reduce API costs and latency:

Supported Cache Backends

Memory (LRU): In-memory cache with LRU eviction, fast but ephemeral
Redis: High-performance distributed caching, recommended for production
SQLite: Persistent local database caching
MySQL: Network database caching for multi-server deployments

Features

SHA256-based Deduplication: Generates cache keys from request content for intelligent deduplication
TTL-based Expiration: Configurable timeout (default: 600 seconds)
LRU Eviction: Automatic eviction of least recently used entries (memory backend)
Cache Statistics: Tracks hits, misses, hit rate, and evictions
Granular Control: Enable/disable caching at model, provider, rotation, or autoselect level
Hierarchical Configuration: Model > Provider > Rotation > Autoselect > Global
Dashboard Management: View statistics and clear cache via dashboard
Streaming Skip: Automatically skips caching for streaming requests

Configuration

Via Dashboard:

Navigate to Dashboard → Settings → Response Cache
Enable response caching
Select cache backend (Memory, Redis, SQLite, MySQL)
Configure TTL and max size
Save settings and restart server

Via Configuration File: Edit ~/.aisbf/aisbf.json:

{
  "response_cache": {
    "enabled": true,
    "backend": "redis",
    "ttl": 600,
    "max_size": 1000,
    "redis_host": "localhost",
    "redis_port": 6379
  }
}

Cache Hit Rate: Typically achieves 20-30% cache hit rate in production workloads, significantly reducing API costs and latency.

Database Configuration

AISBF supports multiple database backends for persistent storage of configurations, token usage tracking, and context management:

Supported Databases

SQLite (Default): File-based database, no additional setup required, suitable for most users
MySQL: Network database server, better for multi-server deployments and advanced analytics

SQLite Configuration (Default)

SQLite is automatically configured and requires no additional setup:

Database file: ~/.aisbf/aisbf.db
Automatic initialization and table creation
WAL mode enabled for concurrent access
Automatic cleanup of old records

MySQL Configuration

For production deployments requiring MySQL:

Prerequisites:

MySQL server installed and running
Database and user created with appropriate permissions

Via Dashboard:

Navigate to Dashboard → Settings → Database Configuration
Select "MySQL" as database type
Configure connection parameters:
- Host: MySQL server hostname/IP
- Port: MySQL server port (default: 3306)
- Username: MySQL database username
- Password: MySQL database password
- Database: MySQL database name
Save settings and restart server

Via Configuration File: Edit ~/.aisbf/aisbf.json:

{
  "database": {
    "type": "mysql",
    "mysql_host": "localhost",
    "mysql_port": 3306,
    "mysql_user": "aisbf",
    "mysql_password": "your_password",
    "mysql_database": "aisbf"
  }
}

Database Migration: When switching database types, AISBF will automatically create tables in the new database. Existing data will not be migrated automatically - you may need to export/import configurations manually.

Cache Configuration

AISBF includes a flexible caching system for improved performance and reduced API costs:

Supported Cache Backends

SQLite (Default): Local database storage, persistent and structured
MySQL: Network database caching, scalable for multi-server deployments
Redis: High-performance distributed caching, recommended for production
File-based: Legacy local file storage
Memory: In-memory caching (ephemeral, lost on restart)

Cache Features

Model Embeddings: Cached vectorized model descriptions for semantic matching
Provider Models: Cached API model listings with configurable TTL
Automatic Fallback: Falls back to file-based if Redis is unavailable
Configurable TTL: Set cache expiration times per data type

Redis Configuration

For high-performance caching in production environments:

Prerequisites:

Redis server installed and running
Optional: Redis authentication configured

Via Dashboard:

Navigate to Dashboard → Settings → Cache Configuration
Select "Redis" as cache type
Configure connection parameters:
- Host: Redis server hostname/IP
- Port: Redis server port (default: 6379)
- Database: Redis database number (default: 0)
- Password: Redis password (optional)
- Key Prefix: Prefix for Redis keys (default: aisbf:)
Save settings and restart server

Via Configuration File: Edit ~/.aisbf/aisbf.json:

{
  "cache": {
    "type": "redis",
    "redis_host": "localhost",
    "redis_port": 6379,
    "redis_db": 0,
    "redis_password": "",
    "redis_key_prefix": "aisbf:"
  }
}

Cache Performance Benefits

Faster Model Selection: Cached embeddings eliminate repeated vectorization
Reduced API Calls: Cached provider model listings reduce API overhead
Lower Latency: Redis provides sub-millisecond cache access
Scalability: Distributed Redis supports multiple AISBF instances

Provider-Native Caching Configuration

AISBF supports provider-native caching for reduced API costs and latency across multiple provider types:

Supported Providers

Provider	Caching Method	Configuration
OpenAI	Automatic prefix caching (no code change needed)	Enabled by default for prompts >1024 tokens
DeepSeek	Automatic in 64-token chunks	Enabled by default
Anthropic	`cache_control` with `{"type": "ephemeral"}`	Requires `enable_native_caching: true`
OpenRouter	`cache_control` (wraps Anthropic)	Requires `enable_native_caching: true`
Google	Context Caching API (`cached_contents.create`)	Requires `enable_native_caching: true` and `cache_ttl`

Configuration Options

Add to provider configuration in providers.json:

{
  "providers": {
    "my_provider": {
      "type": "openai",
      "endpoint": "https://api.openrouter.ai/v1",
      "enable_native_caching": true,
      "min_cacheable_tokens": 1024,
      "prompt_cache_key": "optional-cache-key-for-load-balancer"
    },
    "anthropic_provider": {
      "type": "anthropic",
      "api_key_required": true,
      "enable_native_caching": true,
      "min_cacheable_tokens": 1024
    },
    "google_provider": {
      "type": "google",
      "api_key_required": true,
      "enable_native_caching": true,
      "cache_ttl": 3600,
      "min_cacheable_tokens": 1024
    }
  }
}

Configuration Fields

enable_native_caching: Enable provider-native caching (boolean, default: false)
min_cacheable_tokens: Minimum token count for caching (default: 1024, matches OpenAI)
cache_ttl: Cache TTL in seconds for Google Context Caching (optional)
prompt_cache_key: Optional cache key for OpenAI's load balancer routing optimization

How It Works

Automatic Providers (OpenAI, DeepSeek): Caching is handled automatically by the provider based on prompt length and prefix matching - no code changes required in AISBF.
Explicit Providers (Anthropic, OpenRouter, Google): AISBF adds cache_control blocks to messages or creates cached content objects after the first request, then reuses them for subsequent requests with similar prefixes.
OpenAI-Compatible: Custom providers using OpenAI-compatible endpoints (like OpenRouter) support both automatic caching and explicit cache_control blocks.

Cost Reduction

Anthropic: 50-70% cost reduction with cache_control on long conversations
Google: Up to 75% cost reduction with Context Caching API
OpenAI: Automatic savings on repeated prompt prefixes
OpenRouter: Passes through caching benefits from underlying providers

Provider-Level Defaults

Providers can now define default settings that cascade to all models:

default_context_size: Default maximum context size for all models in this provider
default_condense_context: Default condensation threshold percentage (0-100)
default_condense_method: Default condensation method(s) for all models

Cascading Priority:

Rotation model config (highest priority, only if explicitly set)
Provider model-specific config
Provider default config (NEW)
System defaults

This allows minimal model definitions in rotations - unspecified fields automatically inherit from provider defaults.

Model Configuration

Models can be configured with the following optional fields:

max_request_tokens: Maximum tokens allowed per request. Requests exceeding this limit are automatically split into multiple smaller requests.
rate_limit_TPM: Maximum tokens allowed per minute (Tokens Per Minute)
rate_limit_TPH: Maximum tokens allowed per hour (Tokens Per Hour)
rate_limit_TPD: Maximum tokens allowed per day (Tokens Per Day)
context_size: Maximum context size in tokens for the model. Used to determine when to trigger context condensation.
condense_context: Percentage (0-100) at which to trigger context condensation. 0 means disabled, any other value triggers condensation when context reaches this percentage of context_size.
condense_method: String or list of strings specifying condensation method(s). Supported values: "hierarchical", "conversational", "semantic", "algorithmic". Multiple methods can be chained together.

When token rate limits are exceeded, providers are automatically disabled:

TPM limit exceeded: Provider disabled for 1 minute
TPH limit exceeded: Provider disabled for 1 hour
TPD limit exceeded: Provider disabled for 1 day

Context Condensation Methods

When context exceeds the configured percentage of context_size, the system automatically condenses the prompt using one or more methods:

Hierarchical: Separates context into persistent (long-term facts) and transient (immediate task) layers - Pure algorithmic, no LLM needed
Conversational: Summarizes old messages using a smaller model to maintain conversation continuity - LLM-based
Semantic: Prunes irrelevant context based on current query using a smaller "janitor" model - LLM-based
Algorithmic: Uses mathematical compression for technical data and logs (similar to LLMLingua) - Pure algorithmic, no LLM needed

Note: Only conversational and semantic methods require LLM calls and use prompt files from config/. The hierarchical and algorithmic methods are pure algorithmic transformations.

See config/providers.json and config/rotations.json for configuration examples.

Multi-User Database Integration

AISBF includes comprehensive multi-user support with isolated configurations stored in a SQLite database:

User Management

Admin Users: Full access to global configurations and user management
Regular Users: Access to their own configurations and usage statistics
Role-Based Access: Secure separation between admin and user roles

Database Features

Persistent Storage: All configurations stored in SQLite database with automatic initialization
Token Usage Tracking: Per-user API token usage statistics and analytics
Configuration Isolation: Each user has separate providers, rotations, and autoselect configurations
Automatic Cleanup: Database maintenance with configurable retention periods

User-Specific Configurations

Users can create and manage their own:

Providers: Custom API endpoints, models, and authentication settings
Rotations: Personal load balancing configurations across providers
Autoselect: Custom AI-powered model selection rules
API Tokens: Multiple API tokens with usage tracking and management

Dashboard Access

Admin Dashboard: Global configuration management and user administration
User Dashboard: Personal configuration management and usage statistics
API Token Management: Create, view, and delete API tokens with usage analytics

Adaptive Rate Limiting

AISBF includes intelligent rate limit management that learns from provider 429 responses and automatically adjusts request rates:

Features

Learning from 429 Responses: Automatically detects rate limits from provider error responses
Exponential Backoff with Jitter: Configurable backoff strategy to avoid thundering herd
Rate Limit Headroom: Stays 10% below learned limits to prevent hitting rate limits
Gradual Recovery: Slowly increases rate after consecutive successful requests
Per-Provider Tracking: Independent rate limiters for each provider
Dashboard Monitoring: Real-time view of current limits, 429 counts, success rates, and recovery progress

Configuration

Via Dashboard:

Navigate to Dashboard → Rate Limits
View current rate limits and statistics for each provider
Reset individual provider limits or reset all
Monitor 429 response patterns and success rates

Via Configuration File: Edit ~/.aisbf/aisbf.json:

{
  "adaptive_rate_limiting": {
    "enabled": true,
    "learning_rate": 0.1,
    "headroom_percent": 10,
    "recovery_rate": 0.05,
    "base_backoff": 1.0,
    "jitter_factor": 0.1,
    "history_window": 100
  }
}

Configuration Fields

enabled: Enable adaptive rate limiting (default: true)
learning_rate: How quickly to adjust limits (0.0-1.0, default: 0.1)
headroom_percent: Safety margin below learned limit (default: 10%)
recovery_rate: Rate of limit increase after successes (default: 0.05)
base_backoff: Base backoff time in seconds (default: 1.0)
jitter_factor: Random jitter to prevent synchronized retries (default: 0.1)
history_window: Number of recent requests to track (default: 100)

Benefits

Automatic Optimization: No manual rate limit configuration needed
Reduced 429 Errors: Learns optimal request rates for each provider
Better Resource Utilization: Maximizes throughput while respecting limits
Provider-Specific: Each provider has independent rate limit tracking

Content Classification and Semantic Selection

AISBF provides advanced content filtering and intelligent model selection based on content analysis:

NSFW/Privacy Content Filtering

Models can be configured with nsfw and privacy boolean flags to indicate their suitability for sensitive content:

nsfw: Model supports NSFW (Not Safe For Work) content
privacy: Model supports privacy-sensitive content (e.g., medical, financial, legal data)

When global classify_nsfw or classify_privacy is enabled, AISBF automatically analyzes the last 3 user messages to classify content and routes requests only to appropriate models. If no suitable models are available, the request returns a 404 error.

Configuration:

Provider models: Set in config/providers.json
Rotation models: Override in config/rotations.json
Global settings: Enable/disable in config/aisbf.json or dashboard

Semantic Model Selection

For enhanced performance, autoselect configurations can use semantic classification instead of AI model selection:

Hybrid BM25 + Semantic Search: Combines fast keyword matching with semantic similarity
Sentence Transformers: Uses pre-trained embeddings for content understanding
Automatic Fallback: Falls back to AI model selection if semantic classification fails

Enable Semantic Classification:

{
  "autoselect": {
    "autoselect": {
      "classify_semantic": true,
      "selection_model": "openai/gpt-4",
      "available_models": [...]
    }
  }
}

Benefits:

Faster model selection (no API calls required)
Lower costs (no tokens consumed for selection)
Deterministic results based on content similarity
Automatic model library indexing and caching

See config/autoselect.json for configuration examples.

API Endpoints

Three Proxy Paths

AISBF provides three ways to proxy AI models:

PATH 1: Direct Provider Models

Format: {provider_id}/{model_name}

Access specific models from configured providers
Example: openai/gpt-4, gemini/gemini-2.0-flash, anthropic/claude-3-5-sonnet-20241022, kilotest/kilo/free

PATH 2: Rotations

Format: rotation/{rotation_name}

Weighted load balancing across multiple providers
Automatic failover on errors
Example: rotation/coding, rotation/general

PATH 3: Autoselect

Format: autoselect/{autoselect_name}

AI-powered model selection based on content analysis
Automatic routing to specialized models
Example: autoselect/autoselect

General Endpoints

GET / - Server status and provider list (includes providers, rotations, and autoselect)
GET /api/models - List all available models from all three proxy paths
GET /api/v1/models - OpenAI-compatible model listing (same as /api/models)

Chat Completions

Unified Endpoints (Recommended)

These endpoints accept all three proxy path formats in the model field:

OpenAI-Compatible Format:

POST /api/v1/chat/completions
{
  "model": "openai/gpt-4",           # PATH 1: Direct provider
  "messages": [...]
}

POST /api/v1/chat/completions
{
  "model": "rotation/coding",        # PATH 2: Rotation
  "messages": [...]
}

POST /api/v1/chat/completions
{
  "model": "autoselect/autoselect",  # PATH 3: Autoselect
  "messages": [...]
}

Alternative Format:

POST /api/{provider_id}/chat/completions
# Where provider_id can be:
# - A provider name (e.g., "openai", "gemini")
# - A rotation name (e.g., "coding", "general")
# - An autoselect name (e.g., "autoselect")

Audio Endpoints

Model specified in request (supports all three proxy paths):

POST /api/audio/transcriptions - Audio transcription
- Example: {"model": "openai/whisper-1", "file": ...} or {"model": "rotation/coding", "file": ...}
POST /api/audio/speech - Text-to-speech
- Example: {"model": "openai/tts-1", "input": "Hello"} or {"model": "rotation/coding", "input": "Hello"}
POST /api/v1/audio/transcriptions - OpenAI-compatible audio transcription
POST /api/v1/audio/speech - OpenAI-compatible text-to-speech

Image Endpoints

Model specified in request (supports all three proxy paths):

POST /api/images/generations - Image generation
- Example: {"model": "openai/dall-e-3", "prompt": "A cat"} or {"model": "rotation/coding", "prompt": "A cat"}
POST /api/v1/images/generations - OpenAI-compatible image generation

Embeddings Endpoints

Model specified in request (supports all three proxy paths):

POST /api/embeddings - Text embeddings
- Example: {"model": "openai/text-embedding-ada-002", "input": "Hello"} or {"model": "rotation/coding", "input": "Hello"}
POST /api/v1/embeddings - OpenAI-compatible embeddings

Legacy Endpoints

These endpoints are maintained for backward compatibility:

GET /api/rotations - List all available rotation configurations
POST /api/rotations/chat/completions - Chat completions using rotation (model name = rotation name)
GET /api/rotations/models - List all models across all rotation configurations
GET /api/autoselect - List all available autoselect configurations
POST /api/autoselect/chat/completions - Chat completions using autoselect (model name = autoselect name)
GET /api/autoselect/models - List all models across all autoselect configurations
GET /api/{provider_id}/models - List available models for a specific provider, rotation, or autoselect

User-Specific API Endpoints

Authenticated users can access their own configurations via user-specific API endpoints. These endpoints require either a valid API token (generated in the user dashboard) or session authentication.

Authentication

Option 1: Bearer Token (Recommended for API access)

Authorization: Bearer YOUR_API_TOKEN

Option 2: Query Parameter

?token=YOUR_API_TOKEN

User API Endpoints

Endpoint	Description
`GET /api/user/models`	List available models from user's own configurations
`GET /api/user/providers`	List user's provider configurations
`GET /api/user/rotations`	List user's rotation configurations
`GET /api/user/autoselects`	List user's autoselect configurations
`POST /api/user/chat/completions`	Chat completions using user's own models
`GET /api/user/{config_type}/models`	List models for specific config type (provider, rotation, autoselect)

Access Control

Admin Users have access to both global and user configurations when using user API endpoints.

Regular Users can only access their own configurations.

Global Tokens (configured in aisbf.json) have full access to all configurations.

Token Management

Users can create and manage API tokens through the dashboard:

Navigate to Dashboard → User Dashboard → API Tokens
Click "Generate New Token" to create a token
Copy the token immediately (it won't be shown again)
Use the token in API requests via Bearer authentication
View token usage statistics and delete tokens as needed

Example: Using User API with cURL

# List user models
curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/models

# Chat using user's own models
curl -X POST -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "your-rotation/model", "messages": [{"role": "user", "content": "Hello"}]}' \
  http://localhost:17765/api/user/chat/completions

# List user's providers
curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/providers

# List user's rotations
curl -H "Authorization: Bearer YOUR_TOKEN" http://localhost:17765/api/user/rotations

MCP Integration

User tokens also work with MCP (Model Context Protocol) endpoints:

Admin users get access to both global and user-specific MCP tools
Regular users get access to user-only MCP tools
Tools include model access, configuration management, and usage statistics

MCP (Model Context Protocol)

AISBF provides an MCP server for remote agent configuration and model access:

SSE Endpoint: GET /mcp - Server-Sent Events for MCP communication
HTTP Endpoint: POST /mcp - Direct HTTP transport for MCP

MCP tools include:

list_models - List available models (user or global depending on auth)
chat_completions - Send chat completion requests
get_providers - Get provider configurations
get_rotations - Get rotation configurations
get_autoselects - Get autoselect configurations
And more for authenticated users to manage their own configs

User tokens authenticate MCP requests, with admin users getting full access and regular users getting user-only access.

Content Proxy

GET /api/proxy/{content_id} - Proxy generated content (images, audio, etc.)

Dashboard Endpoints

GET /dashboard - Web-based configuration dashboard
GET /dashboard/login - Dashboard login page
POST /dashboard/login - Handle dashboard authentication
GET /dashboard/logout - Logout from dashboard
GET /dashboard/providers - Edit providers configuration (includes provider-level defaults for condensation)
GET /dashboard/rotations - Edit rotations configuration
GET /dashboard/autoselect - Edit autoselect configuration (supports multiple autoselect configs with "internal" model option)
GET /dashboard/settings - Edit server settings (includes SSL/TLS configuration with Let's Encrypt support)
POST /dashboard/restart - Restart the server

Dashboard Features:

Provider Configuration: Set default condensation settings at provider level that cascade to all models
Autoselect Configuration: Configure multiple autoselect models with "internal" option for using local HuggingFace models
SSL/TLS Management: Configure HTTPS with automatic Let's Encrypt certificate generation and renewal
Collapsible UI: All configuration sections use collapsible panels for better organization

Error Handling

Rate limiting for failed requests
Automatic retry with provider rotation
Proper error tracking and logging
Fixed streaming response serialization for OpenAI-compatible providers
Improved autoselect model selection with explicit output requirements

Donations

The project includes multiple donation options to support its development:

Ethereum Donation

ETH to 0xdA6dAb526515b5cb556d20269207D43fcc760E51

PayPal Donation

https://paypal.me/nexlab

Bitcoin Donation

Address: bc1qcpt2uutqkz4456j5r78rjm3gwq03h5fpwmcc5u Traditional BTC donation method

Documentation

See DOCUMENTATION.md for complete API documentation, configuration details, and development guides.

License

GNU General Public License v3.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.99.18

Apr 11, 2026

0.99.17

Apr 11, 2026

0.99.16

Apr 11, 2026

This version

0.99.15

Apr 9, 2026

0.99.11

Apr 9, 2026

0.99.10

Apr 9, 2026

0.99.9

Apr 9, 2026

0.99.8

Apr 9, 2026

0.99.7

Apr 9, 2026

0.99.6

Apr 9, 2026

0.99.5

Apr 9, 2026

0.99.4

Apr 9, 2026

0.99.3

Apr 9, 2026

0.99.2

Apr 9, 2026

0.99.1

Apr 9, 2026

0.99.0

Apr 9, 2026

0.9.8.post1

Apr 5, 2026

0.9.8

Apr 4, 2026

0.9.7

Apr 3, 2026

0.9.6

Apr 3, 2026

0.9.5

Apr 3, 2026

0.9.3

Apr 3, 2026

0.9.2

Apr 3, 2026

0.5.0

Mar 23, 2026

0.3.2

Feb 8, 2026

0.3.0

Feb 7, 2026

0.2.7

Feb 6, 2026

0.2.6

Feb 6, 2026

0.2.5

Feb 6, 2026

0.2.4

Feb 6, 2026

0.2.3

Feb 6, 2026

0.2.2

Feb 6, 2026

0.2.0

Feb 6, 2026

0.1.2

Feb 6, 2026

0.1.1

Feb 6, 2026

0.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aisbf-0.99.15.tar.gz (429.8 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aisbf-0.99.15-py3-none-any.whl (720.3 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file aisbf-0.99.15.tar.gz.

File metadata

Download URL: aisbf-0.99.15.tar.gz
Upload date: Apr 9, 2026
Size: 429.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for aisbf-0.99.15.tar.gz
Algorithm	Hash digest
SHA256	`8f943dd3b87963e9e3caad0747c0bd48268048e87ef067d8fa10ae7b0a58967f`
MD5	`d4fd669098027f298e388e4f177de980`
BLAKE2b-256	`71385af5afbe76c657254002c83c7dcbfb797533e008303aba2f34d17d694e9c`

See more details on using hashes here.

File details

Details for the file aisbf-0.99.15-py3-none-any.whl.

File metadata

Download URL: aisbf-0.99.15-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 720.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for aisbf-0.99.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b11bbaf2e900553eb82679471785a0afb9b52bf970d85b84f60345a1f1bf621`
MD5	`a63a469673ef0325c50795921919faad`
BLAKE2b-256	`2101e959ec0a783f0c81af90b7a951e190d6dd0eb0839508a8ac520c706a5b6e`

See more details on using hashes here.

aisbf 0.99.15

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AISBF - AI Service Broker Framework || AI Should Be Free

Web Dashboard

Key Features

Author

Repository

Quick Start

Installation

From PyPI (Recommended)

From Source

Usage

Development

Building the Package

Cleaning Build Artifacts

PyPI Publishing

Supported Providers

Kiro-cli Provider Support

Features

Setup

Configuration Example

Kilocode OAuth2 Authentication

Features

Setup

Configuration Example

Codex OAuth2 Authentication

Features

Setup

Configuration Example

Qwen OAuth2 Authentication

Features

Setup

Configuration Example

API Key Mode (Optional)

Available Models

Configuration

SSL/TLS Configuration

Self-Signed Certificates (Default)

Let's Encrypt Integration

TOR Hidden Service Configuration

Prerequisites

Configuration

Ephemeral vs Persistent Hidden Services

Accessing Your Hidden Service

Security Considerations

Claude OAuth2 Authentication

Features

Setup

Configuration Example

Response Caching (Semantic Deduplication)

Supported Cache Backends

Features

Configuration

Database Configuration

Supported Databases

SQLite Configuration (Default)

MySQL Configuration

Cache Configuration

Supported Cache Backends

Cache Features

Redis Configuration

Cache Performance Benefits

Provider-Native Caching Configuration

Supported Providers

Configuration Options

Configuration Fields

How It Works

Cost Reduction

Provider-Level Defaults

Model Configuration

Context Condensation Methods

Multi-User Database Integration

User Management